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ABSTRACT 


Ground-Penetrating  Radar  (GPR)  is  a  powerful  modem  tool  to  examine  the 
structure  and  properties  of  the  media  below  the  ground  surface  within  a  depth  of  30 
meters.  This  study  is  very  important  for  the  environmental  problems  related  to  the 
transport  of  contaminants  in  ground  waters.  Successful  GPR  profiling  of  the  subsurface 
media  yielding  the  correct  information  about  the  distribution  of  permafrost,  water  table, 
and  bedrock  depths  is  the  key  factor  in  ground  water  flow  modeling. 

This  work  attempts  to  develop  a  hierarchical  processing  system  capable  of 
handling  GPR  data  characterized  by  high  degree  of  uncertainty,  natural  physical  ambiguity, 
and,  sometimes,  missing  or  incorrect  entries.  The  hierarchical  nature  of  the  algorithm 
allows  to  split  the  task  of  media  profiling  into  several  consecutive  stages,  each  of  the 
following  has  less  degree  of  uncertainty  than  the  previous  one. 

Neural  Networks  modules  are  designed  to  accomplish  the  two  main  processing 
goals  of  recognizing  the  “subsurface  pattern”  followed  by  the  identification  of  the  depths 
of  the  subsurface  layers  like  permafrost,  ground  water  table,  and  bedrock.  Pre-processing 
procedure  has  the  objective  to  transform  raw  GPR  data  into  a  small  feature  vector 
containing  the  most  representative  and  discriminative  features  of  the  signal.  The  feature 
vector  coupled  with  other  relevant  GPR  information  forms  the  input  for  the  Neural 
Network  processing  units. 

The  separate  system  components  are  implemented  in  software  and  tested  with 
artificial  as  well  as  real  GPR  data.  The  entire  processing  system  is  trained  with  synthetic 
inputs  and  tested  with  real  measured  data.  The  algorithm  demonstrates  correct,  accurate, 
and  tolerant  to  noise/error  performance.  This  establishes  the  feasibility  of  the  system 
application  to  true-to-life  problems.  The  developed  software  proved  to  be  fast  enough  to 
consider  the  possibility  of  real-time  implementation  for  field  operation  in  conjunction  with 
the  industrial  GPR  equipment. 
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1. 


INTRODUCTION 


1.1  GPR  technique 

Ground  Penetrating  Radar  (GPR)  is  an  electromagnetic  remote  sensing  technique 
which  uses  radio  waves,  typically  in  the  10  to  2500  MHz  frequency  range,  to  locate  and 
map  different  features  and  structures  below  the  ground  surface  (bgs).  In  general,  a  GPR 
system  transmits  a  short  electromagnetic  pulse  into  the  ground  -  the  pulse  is  reflected, 
refracted  or  scattered  by  the  targets  that  exhibit  some  difference  in  electrical  properties 
(dielectric  permittivity,  conductivity,  and  magnetic  permeability)  and  is  then  recorded  by 
the  receiving  antennas.  The  greater  is  the  difference  in  the  dielectric  permeability,  the 
larger  is  the  amplitude  of  the  reflection  pulse. 

High  radar  frequencies  are  needed  to  achieve  a  good  spatial  resolution,  but 
penetration  depth  of  the  electric  field  is  inversely  proportional  to  the  frequency.  Hence  the 
choice  of  frequency  range  is  a  trade-off  between  resolution  and  penetration  depth. 
Penetration  depth  also  depends  on  the  nature  of  the  soil,  which  has  different  attenuation 
properties.  For  example,  desert  sand  has  an  attenuation  of  about  1  dB/m  for  a  1  GHz 

frequency,  clay  has  an  attenuation  of  100  dB/m 
at  the  same  frequency. 

The  reflected  wave  is  sampled  and 
digitized  by  an  A/D  converter  to  form  a  vector. 
Typically  512  or  1024  points  are  taken  through 
the  region  of  interest.  The  recorded  signal  in 
the  time  domain  is  called  an  A-scan  and  it  may 
look  similar  to  that  shown  in  the  Figure  1-1. 

In  many  GPR  applications  the  A-scans  are 
recorded  consecutively  along  some  spatial 
direction  usually  called  radar  or  transect  line. 

Typical  GPR  system  records  5  to  10  scans  per  meter.  A  variety  of  software 
packages  exist  for  visualization  and  data  processing  (see  Section  1.3  for  more  details). 


Figure  1-1.  Typical  GPR  A-scan. 
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Typical  GPR  soundings  are  performed  by  dragging  a  GPR  hardware  package  including 
transmitting  and  receiving  antennas  behind  a  vehicle  -  Figure  1-2. 


Figure  1-2.  Field  measurements  with  the  GPR  hardware  dragged  behind  a  vehicle. 
1.2  Application  of  the  GPR 

GPR  technique  includes  various  scientific,  industrial,  environmental  and  military 
applications,  among  those  are: 

•  Stratigraphic  layers  profiling  • 

(water  table  detection,  etc.); 

•  Ice  thickness  measurements;  • 

•  Buried  objects  detection; 

•  Archaeological  investigations; 

•  Road  investigations; 


Safety  inspections  at  nuclear  power 
plants; 

Anti-personnel  mines  detection. 


•  Fracture  detection  in  hard  rock; 

•  Liquid  contamination  detection; 


This  work  concentrates  on  a  particular  application  -  non-invasive  site 
characterization  of  stratigraphic  layer  depths  in  the  vicinity  of  Fairbanks,  Alaska,  USA. 
One  of  the  most  important  features  of  this  region  is  the  presence  of  the  permanently  frozen 
materials  (permafrost),  which  have  distinct  dielectric  properties.  The  GPR  data  were 
collected  to  identify  horizontal  and  vertical  distributions  of  permafrost  zones,  water  table 
and  bedrock.  GPR  was  chosen  to  accomplish  this  task  since  [1]:  the  scales  of  depths  and 
lateral  variations  of  permafrost  are  too  small  to  resolve  with  seismic  methods,  and  too 
large  for  efficient  mapping  with  electromagnetic  inductance  methods.  Frozen  soils  (mostly 
sands  and  gravels)  are  a  low  loss  propagation  media  for  electromagnetic  waves  at  radio 
frequencies.  Therefore,  saturated  sediments  form  a  continuous  and  highly  reflective 
surface  with  frozen  ground.  In  this  situation  GPR  is  one  of  the  best  tools  to  study  the 
vertical  distribution  of  the  different  subsurface  materials. 

The  ability  to  characterize  the  subsurface  media  is  very  important  for  the 
environmental  problems  related  to  transporting  of  contaminants  with  ground  water. 
Successful  GPR  profiling  plays  a  critical  role  in  the  site  description. 

1.3  Previous  research  efforts 

Significant  research  efforts  in  the  field  of  GPR  were  made  by  CRREL  (US  Army 
Cold  Regions  Research  and  Engineering  Laboratory).  They  performed  Multi-bandwidth 
reflection  profiling  of  discontinuous  permafrost  via  GPR  during  1993-94.  The  GPR 
antennas  bandwidths  centered  near  50,  100  and  300  MHz.  An  area  spanning  8  km  had 
over  100  km  of  GPR  profiles  recorded  [1]. 

The  subsurface  layers  configurations  for  the  area  is  shown  schematically  in  Figure 
1-3.  Variations  in  the  permafrost  structure  may  have  natural  (presence  of  the  river)  or 
artificial  (roads  or  other  human  activities)  origins,  and  those  variations  may  be  significant 
over  a  relatively  short  lateral  scale.  On  the  other  hand  variations  of  the  water  table  and  the 
bedrock  absolute  depth  are  relatively  small.  Along  with  the  above  mentioned  major 
subsurface  components  variations  in  types  of  soils  and  moisture  content  introduce 
additional  difficulties  into  appropriate  data  processing. 


3 


Thaw 

Zone 


\  1  Unfrozen 

|  1  Zone 


Figure  1-3.  Possible  configuration  of  subsurface  layers  [1]. 


The  commercial  software  package  RADAN™  by  Geophysical  Survey  Systems, 
Inc.  was  used  to  view  the  collected  GPR  data.  The  reflection  time  for  one  layer  of  an 
electromagnetic  pulse  from  the  interface  of  materials  with  the  different  dielectric 
properties  can  be  expressed  as: 


where  c  is  the  light  speed,  d  is  the  layer  thickness,  and  s  is  the  corresponding  dielectric 
constant. 

Previously  GPR  line  data  processing  used  RADAN™  software  to  visualize  sets  of 
adjacent  A-scans  coupled  with  a  human  expert  evaluation  based  on  previous  knowledge 
and  generalization  from  the  available  geophysical  data  (Figure  1-4).  This  analysis  was 
augmented  with  the  information  of  the  subsurface  structure  from  a  number  of  boreholes. 
Those  boreholes  were  drilled  throughout  the  area,  and  types  and  properties  of  materials  as 
well  as  corresponding  layers  depths  were  recorded. 


The  previously  mentioned  method  is  highly  subjective.  It  lacks  any  form  of 
automation  and  the  layer  interfaces  identified  by  the  expert  still  lack  depth  information. 
The  time  scale  of  the  GPR  signal  does  not  scale  linearly  with  depth.  That  is  why  a  new 
approach  has  been  developed  for  automatic  stratigraphic  layer  identification  and  depths 
prediction. 

1.4  Hierarchical  neural  network  based  approach 

The  proposed  processing  architecture  has  the  following  advantageous  features: 

•  main  task  is  split  into  several  consecutive  stages  decreasing  the  degree  of  uncertainty 
within  each  step; 

•  data  adaptive  techniques  are  applied  to  the  GPR  signal  transforming  it  into  a  highly 
informative  and  easily  interpretable  feature  vector; 

•  neural  network  modules  used  provide  high  error  and  noise  tolerance. 

A  block  diagram  of  the  approach  is  shown  in  Figure  1-5.  The  major  processing 
units  of  the  data  processing  system  are  neural  networks  (NN).  One  NN  (NN1)  is  designed 
to  perform  best  on  a  classification  problem.  The  second  NN  (NN2)  provides  good 
approximation  ability  for  depth  analysis  within  the  classified  problem.  Both  neural 
networks  belong  to  the  class  of  two-layer  feedforward  networks.  Both  neural  networks 
are  of  the  Multi-Layer  Perceptron  (MLP)  type  trained  with  Backpropagation  or  Scaled 
Conjugate  Gradient  algorithms. 


Figure  1-5.  Block  diagram  of  the  data  processing  architecture. 

Another  important  part  of  this  architecture  is  the  pre-processing  stage  -  it  has  four 
separate  blocks  responsible  for  proper  data  handling  and  feature  extraction  to  provide  the 
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neural  networks  with  relevant  input  information. 

In  general  the  system  operation  may  be  described  as:  the  pre-processing  unit 
performs  checking  of  the  data  consistency;  the  second  stage  decomposes  the  initial  A-scan 
with  our  Adaptive  Transform  (AT)  technique  into  a  set  of  data  adaptive  basis  functions; 
the  third  section  incorporates  information  about  the  previous  A-scans  with  the  linear 
regression  method  into  the  current  processing  step.  Then  a  feature  vector  formed  from  the 
coefficients  of  the  AT  decomposition  and  some  other  prior  available  information  is  fed 
into  the  Neural  Network  1  (NN1)  to  recognize  the  “subsurface  pattern”  for  the  current  A- 
scan.  This  subsurface  pattern  reflects  one  of  the  possible  subsurface  layers  configuration 
(similar  to  Figure  4-3).  During  the  next  step  part  of  the  transformed  input  for  the  NN1  and 
already  available  information  about  the  subsurface  pattern  is  used  as  input  for  Neural 
Network  2  (NN2),  which  produces  the  set  of  the  startigraphic  layers  depths. 

A  feedback  routine  is  used  to  account  for  possible  incorrect  recognition  of  the 
pattern  or  other  “alarm”  signals  produced  during  system  operation.  This  feedback  feature 
adds  flexibility  to  the  entire  set-up  that  already  has  as  high  noise/error  tolerant  processing 
units  as  neural  networks. 

1.5  Outline 

The  introduction  briefly  described  the  basics  of  the  GPR  technique,  prospectives  of 
its  application  to  the  stratigraphic  layers  profiling,  some  of  the  previous  research  efforts, 
and  the  general  approach,  which  is  used  in  this  work  to  address  the  problem  under 
consideration. 

Chapter  2  of  this  work  gives  an  overview  of  the  available  experimental  (real)  GPR 
and  supplementary  (boreholes)  information  along  with  the  brief  description  of  the  Finite 
Difference  Time  Domain  numerical  simulations  for  the  Plane  Wave  formulation.  These 
data  are  used  in  all  stages  of  the  system  development:  pre-processing,  training  and  results 
verification. 

Chapter  3  provides  a  theoretical  background  for  the  Multilayer  Perceptron  neural 
network  models  The  chapter  covers  basic  principles,  variations  of  training  algorithms, 
operating  modes  and  details  of  the  implementation. 
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Chapter  4  discusses  the  pre-processing  stage.  Implications  for  choosing  the 
particular  pre-processing  algorithms  like  dimension  reduction  and  discriminative  feature 
extraction  am  reflected.  It  also  consider  seveml  possible  alternatives  (Fourier  Worm, 
Wavelet  Transform)  to  the  Adaptive  Transform  and  the  logical  chain  of  the  concept 

improvement.  , .  , 

chapter  5  presents  deUuls  of  the  MerareMcal  OPR  data  processing  systems,  which 

incorporates  methods  and  techniques  described  in  Chaplets  3  and  4.  The  issues  o 

parameters  choice  and  optimization,  accuracy  and  reliability  of  the  entire  system  are 

discussed.  Results  of  the  tests  for  different  kinds  of  data  are  evaluated. 

Chapters  6  summarizes  the  current  stage  results  and  describes  possible  farther 
work.  Appendix  A  provides  additional  information  about  simulated  OPR  lines  and  tests 
performed.  Appendix  B  gives  a  detailed  classification  of  the  subsurface  patterns  and  then 
description  as  they  were  used  for  simuiadons  dong  with  examples  of  the  typied  A-scans. 
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REAL  AND  SIMULATED  GPR  DATA 

2.  _ _ _ _  _ _ 

2.  REAU  AND  SIMULATED  GPR  DATA 


2.1  Soil  properties  and  layers  nomenclature 

The  term  “Grotmd  Penetrating  Rader"  implies  the  use  of  radar  signals  directed 
below  the  ground  surface.  This  section  summarizes  factual  as 
unsubstantiated  knowledge  about  the  electricrt  properties  for  several  type,  of  sods  and 
soil  structures.  The  properties  that  chamcterize  the  propagation  of  the  electromagne 
waves  in  an  isotropic  nwdium  me  the  dielectric  permittivity  <e)  and  the  electnc 
conductivity  (cl.  The  first  one  is  responsible  for  the  wave  proportion  time,  the  second 
one- fertile  loss  factor. 

According  to  various  references  [1-3]  both  values  me  ve*  sensitive  to  the  soil 
moisture  content.  For  example,  e  for  dry  soil  cm,  be  in  a  range  of  4  -  ».  but  with  a 
moisture  content  of  30%  e  can  be  as  high  as  40.  The  soil  conductivity  affects  mostly  the 
EM  pulse  evanescence  and  has  a  small  effect  on  the  reflections  ofTthe  layers  boundaries. 
Discussion  of  its  properties  is  postponed  to  Section  2.3.2.  Dielectric  permittmty  values 
for  the  most  types  of  soils  (dry,  srturated,  frozen,  unfrozen)  are  presented  m  the  Ta  e 

2.1. 


Table  2.1.  Soil  dielectric  permittivity. 


SOIL  TYPE 

PHRMTTIVITY  VALUE 

Permanently  frozen  materials 

so 

■ 

Tt1 

Unfrozen  saturated  sediments 

12.0  -  45.0 

Dry  soil 

o 

00 

o 

Weathered  bedrock 

>11.0 

Bedrock:  granite,  sandstone 

7.0  -  9.0, 10.0 

Sand:  dry,  15°/.  moisture,  25%  moisture  content 

«3.0,  «9.0,  **25.0 

_ A  C4rril 1 -4Y  whe 

In  opR  appfad0M  ,hc  following  classifications  me  used  (see  Figure  1-4),  where 
soils-  in  this  work  refer  to  sands  and  gravels,  which  constitute  much  of  the  subsurface 


media  in  Fairbanks,  AK: 
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•  Active  Layer  -  first  layer  of  soil  under  the  surface  with  some  organics  present; 

•  Water  Table  (WT)  -  the  top  boundary  of  the  ground  water; 

•  Permafrost  -  permanently  frozen  soils; 

•  Suprapermaffost  Aquifer  -  saturated  soils  above  permafrost; 

•  Subpermafrost  Aquifer  -  saturated  soils  below  permafrost; 

•  Talik  -  unfrozen  zone  within  permafrost; 

•  Thaw  -  zone,  where  permafrost  is  absent  mostly  due  to  the  artificial  reasons; 

•  Weathered  Bedrock  -  zone,  where  bedrock  is  not  solid  and  is  mixed  with  soil; 

•  Bedrock  -  the  continental  platform. 

2.2  Experimental  GPR  and  related  data 

2.2.1  GPR  hardware  system 

Ground-Penetrating  Radar  by  Geophysical  Survey  Systems,  Inc.  SIR  Model  4800 
was  primarily  used  during  the  field  measurements.  Available  collection  rate  ranged  from 
10  to  50  scans/second.  Two  antennas  were  used  for  transmitting  and  receiving  the  signal. 
The  transmitter  antenna  radiated  a  broadband  wavelet  pulse  with  a  duration  ranging  from 
about  1.5  to  2.5  cycles.  An  identical,  but  separate,  antenna  was  used  as  a  receiver, 
because  echoes  could  return  from  near  surface  events  before  the  transmitter  antenna  had 
stopped  radiating.  A  fiber  optic  connection  was  used  to  trigger  the  transmission. 
Measurements  with  wavelet  bandwidths  centered  near  50,  100,  and  300  MHz  were  used 
employing  a  custom  made  antenna  and  industrial  antennas  GSSI  Model  3207,  GSSI 
Model  3102,  respectively.  Antennas  were  dragged  approximately  10  meters  behind  a 
vehicle  moving  at  a  speed  about  1  meter/second. 

2.2.2  A-scans,  radar  lines,  borehole  data 

A-scans  or  simply  scans  are  the  natural  output  of  the  GPR  system.  A  set  of  a  few 
hundred  or  thousands  adjacent  A-scans  collected  with  a  frequency  of  5-10  scans  per 
meter  is  called  a  radar  line  (GPR  line).  Data  were  stored  in  binary  files,  which  contained 
some  additional  information  about  the  GPR  system  configuration  and  parameters  at  the 
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time  of  the  measurements  -  the  header  (see  Figure  2-1).  A  fixed  number  of  512 
(sometimes  1024)  sampling  points  were  used  in  the  field  experiments.  Duration  of  the 
scan  could  vary  from  500  to  1000  ns  but  remained  the  same  for  a  single  radar  line. 
Amplitude  of  the  reflections  was  sampled  as  an  8  or  16-bit  signal. 


FILE  NAME  (CR94-96 
Channet(s)  |l 
Samp/Scan  1  512 
Bits/Sample  C  8  <*16 


S cans/S econdv|48.047 


Scans/Metei  ]10. 


Created  Aug.  13  1394. 13:45:44 
Channel  Information  - ---  -  -- 


v-- 


flu 


Meter /Mark  JO-1 
DielConstant  |5~ 


Range  Gain  (dB]  -4  0  0.0  41.0 
43.0  43.0  43.0 
43.0 

Vert  MR  LP  N  =2  F  =140  MHz 
Vert  HR  HP  N  =2  F  =25  MHz 
Horz  HR  Stack  TC  =3 
Static  Stacking  N=2 
Horz  Norm:  Scant/Mark  =  100 
Vert  Triang  LP  F  =80  MHz 


Ft.  Wain wright,  Alaska 

3207  pair  hi  power  and  fiber  optics 


[""  "  ^  Cancel  Save  As 


Modified  Aug  19  1 394  00:54:32 


Antenna  |3207  100MHZ 
3j-  '  Position  |nSJ  j^50- 


Ld 


Range  (nS)  J700. 
Top  (m) 

Depth  (m) 


0. 


W 


3D  Options 


Help 


j 


Figure  2-1.  GPR  line  file  header  content. 

Information  about  more  than  50  boreholes  which  were  made  for  different 
purposes  (research,  monitoring)  was  available.  Most  of  those  boreholes  had 
corresponding  GPR  information.  Depths  of  the  boreholes  varied  from  60  to  170  ft, 
drilling  was  usually  terminated  at  bedrock  or  up  to  45  ft  below  permafrost  in  unfrozen 
sediments. 

Figure  2.2  shows  boreholes  corresponding  to  various  GPR  lines.  The 
Groundwater  Modeling  Software  2.0  (GMS)  was  used  to  process  the  area  map  with  the 
GPR  lines  (white  image  in  the  upper  part)  and  the  finite  element  mesh  of  the  same  area. 
After  transforming  the  image  in  a  way  that  the  river  contour  of  the  map  coincides  with 
the  river  elements  in  the  mesh  the  state  plane  coordinates  become  available  to  locate  the 
beginning  and  the  end  of  the  radar  line  (x,  y).  Then  a  simple  routine  automatically  scans 
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through  the  entire  set  of  the  boreholes  checking  their  coordinates  and  picking  those  that 
are  close  to  the  radar  line  of  interest. 


Figure  2-2.  Draft  area  map  and  the  GMS  mesh  superposition. 

Two  figures  below  show  the  examples  of  the  A-scans  from  radar  line  CR94-61r. 
The  space  locations  of  those  scans  correspond  to  the  boreholes  so  that  the  conclusions 
about  the  nature  of  the  reflection  pulses  can  be  made. 

The  scan  in  Figure  2-3  (total  time  -  1000  ns)  has  two  clear  groups  of  reflections, 
according  to  the  available  borehole  data  (Tables  2.2  -  2.4)  the  first  one  is  likely  to  be 
associated  with  the  top  of  permafrost  at  the  depth  of  4  ft  bgs,  another  reflection  is 
probably  due  to  the  subpermafrost  aquifer  or  intrapermafrost  inclusions. 
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Table  2.2.  Boring  information  (including  WT),  scan  1520. 


Well 

Identifier 

X  state 
plane 

Y  state  plane 

Surface 
elevation,  ft 

Depth  of 
the  hole,  ft 

Weather 

cond. 

Depth  to 
WT 

Date 

AP-6551 

246401 .385 

3969487.000 

443.50 

38.0 

Pt.  Cld,45F 

N/A 

9/15/94 

Table  2.3.  Lithology  information,  scan  1520. 


Well  Identifier 

Layer  index  (from 
top  of  hole) 

depth  to 
layer  top  (ft) 

depth  to  layer 
bottom  (ft) 

Primary 

Lithology 

Lithology  modifier 

005 

38.00 

SAND 

SILTY 

AP-6551 

004 

SAND 

POORLY  GRADED 

AP-6551 

003 

15.50 

SAND 

SILTY 

AP-6551 

002 

0.50 

7.50 

SILT 

AP-6551 

001 

0.00 

0.50 

PEAT 

Table  2.4.  Permafrost  information,  scan  1520. 
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The  A-scan  in  Figure  2-4  has  overlapping  reflections  due  to  the  presence  of 
several  zones  with  and  without  permafrost  and  the  ground  water  table,  transitions  from 
frozen  to  unfrozen  and  especially  to  saturated  zone  cause  strong  reflection  patterns. 
Borehole  information  related  to  this  scan  location  is  presented  in  Tables  2. 5-2. 7 


Table  2.5.  Boring  information  (including  WT),  scan  5196. 


Well 

Identifier 

X  state 
plane 

Y  state  plane 

Surface 
elevation,  ft 

Depth  of 
the  hole,  ft 

Weather 

cond. 

Depth  to 
WT 

Date  i 

AP-6223 

244727.672 

3969556.25 

441 .30 

93.00 

Cloudy, 25 

F 

8.0 

11/8/93 

Table  2.6.  Lithology  information,  scan  5196. 


Well  Identifier 

Layer  index  (from 
top  of  hole) 

depth  to 
layer  top  (ft) 

depth  to  layer 
bottom  (ft) 

Primary 

Lithology 

Lithology  modifier 

AP-6223 

010 

77.50 

93.00 

SAND 

POORLY  GRADED 

AP-6223 

009 

60.00 

77.50 

SAND 

SILTY 

AP-6223 

008 

GRAVEL 

POORLY  GRADED 

AP-6223 

007 

40.50 

GRAVEL 

SILTY 

AP-6223 

006 

32.50 

GRAVEL 

POORLY  GRADED 

AP-6223 

005 

msm 

25.50 

SAND 

POORLY  GRADED 

AP-6223 

004 

16.00 

21.00 

GRAVEL 

POORLY  GRADED 

AP-6223 

003 

11.00 

16.00 

SAND 

POORLY  GRADED 

AP-6223 

002 

3.00 

11.00 

GRAVEL 

POORLY  GRADED 

AP-6223 

001 

0.00 

3.00 

SILT 
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Table  2.7.  Permafrost  information,  scan  5196. 


Well  Identifier 

Layer 
number  (in 
sequence) 

Depth  below  ground 
surface  to  top  of 
layer  (feet) 

Depth  below 
ground  surface 
to  bottom  of 
layer  (feet) 

Zone  Identifier 

AP-6223 

001 

0.00 

1.00 

FROZEN 

AP-6223 

002 

1.00 

38.50 

UNFROZEN 

AP-6223 

003 

38.50 

86.00 

FROZEN 

AP-6223 

004 

86.00 

93.00 

UNFROZEN 

2.2.3  Major  difficulties  with  data  processing 

A  fundamental  problem  for  the  GPR  data  processing  lies  in  the  reflection  time 
versus  s-depth  relation  (Equation  1.1):  for  a  fixed  reflection  time  there  exists  an  almost 
infinite  (limited  only  by  the  physical  constraints)  number  of  the  combinations  of  layer 
depth  and  corresponding  dielectric  permittivity  values.  Our  processing  system  strives  to 
resolve  this  ambiguity,  but  fails  at  times  due  to  poorly  collected  data,  improperly  set  gain 
control  or  other  unknown  conditions  (see  Figure  2-5). 


Figure  2-5.  Uncertain  A-scan  due  to  improper  gain  settings 
(amplitude  “chopping”  example). 

Frequently,  an  8-bit  representation  of  the  signal  was  not  sufficient  for  certain 
cases.  Some  of  these  problems  could  be  resolved:  for  example,  with  some  prior 
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knowledge  the  “chopped”  signal  may  be  rescaled  and  using  the  carrier  frequency  value 
chopped  spikes  can  be  restored.  But  the  concept  of  the  automatic  layers  profiling  system 
implies  no  human  control  of  the  signal  before  it  goes  into  the  processing  system. 

2.3  Simulated  FDTD  data 

For  better  understanding  of  the  underlying  physics  associated  with  the 
electromagnetic  pulse  propagation  through  the  ground,  a  Finite  Difference  Time  Domain 
(FDTD)  simulator  was  developed  [19].  Several  physical  models  were  examined:  plane 
wave  (see  Figure  2.6),  line  source,  and  finite  aperture  source  configurations.  Satisfactory 
agreement  of  simulated  and  real  data  was  achieved  for  the  plane-wave  formulation. 


Potttonl  0  0  Pw«lbnJ 

Figure  2-6.  Simulated  plane-wave  electromagnetic  pulse  in  3D. 

2.3.1  Scattering  and  loss 

Assuming  the  plane-wave  formulation  represented  the  signal  behavior  in  the 
subsurface  medium,  there  was  a  concern  to  account  for  some  physical  concepts  such  as 
scattering  and  loss  in  the  numerical  model. 

Scattering  may  be  responsible  for  two  possible  ways  of  distorting  the  signal  with 
respect  to  the  uniform  lossless  medium  model:  a  change  of  the  dielectric  constant  due  to 
the  presence  of  the  scattering  particles  and  an  energy  dissipation  due  to  the  scattering 
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process  [9],  Loss  may  be  considered  as  an  absorption  effect  in  the  medium  incorporating 
the  imaginary  part  of  the  dielectric  constant. 

A  change  of  the  dielectric  constant  may  be  expressed  as:  £effective  =  e  a(s,n0,r), 

where  s  is  dielectric  constant  without  scattering,  no  is  the  concentration  of  the  spherical 
scattering  particles  of  radius  r.  The  change  value  for  the  case  of  £  —  20.0  is  about  10%  for 
sand/gravel  soil.  This  effect  was  not  implemented  directly  into  the  numerical  model. 
Rather,  the  measured  rvalues  already  accounted  for  scattering  and  other  factors  [1]  was 
used. 
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2.4  Matching  experimental  and  simulated  data 

One  of  the  chief  reasons  to  use  the  simulated  data,  during  the  different  stages  of 
the  entire  processing  system  development,  is  the  lack  of  the  100%  reliable  experimental 
GPR  data.  Even  the  presence  of  the  borehole  information  leaves  some  degree  of 
ambiguity  while  trying  to  assign  certain  s  values  to  the  different  subsurface  layers. 
Simulated  data  allow  one  to  figure  out  the  origin  of  every  single  reflection:  i.e.  for  the 
overlapping  reflections  it  is  much  easier  to  verify  the  decomposition  technique  - 
amplitudes  of  the  separate  reflections  are  known  in  advance. 

Another  critical  reason  for  choosing  the  synthetic  data  is  the  possibility  of 
generating  a  training  set  for  the  neural  network:  this  issue  is  discussed  in  details  in 
Section  5.3.  The  ability  to  match  real  GPR  scans  gives  the  ground  for  the  assumption  that 
a  correctly  chosen  simulated  training  set  provides  an  accurate  representation  of  the 
“entire  space”  of  the  possible  GPR  A-scans  allowing  to  adequately  process  real  GPR 
data. 


Figure  2-7.  Real  and  simulated  scans  on  the  time  scale. 
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Figure  2-7  shows  a  real  scan  from  the  CR94-61r  line  and  the  simulated 
counterpart.  Relatively  good  agreement  is  observed  taking  into  account  some  difference 
between  the  propagating  EM  pulse  shape  in  the  field  measurements,  and  the  synthetic 
initial  pulse  in  FDTD  model.  The  time-depth  diagram  for  the  same  pair  of  scans  is  shown 
in  Figure  2-8. 

DEPTH-TIME  DIAGRAM  FOR  SIMULATION  AND  REAL  DATA 


Figure  2-8.  Depth-time  diagram  for  real  and  simulated  scans. 
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3.  NEURAL  NETWORKS  FOR  CLASSIFICATION  AND 
APPROXIMATION  PROBLEMS 

Supervised  neural  networks  find  many  applications  in  two  general  problem 
formulation:  decision-based  and  approximation/optimization-based  [6],  The  decision- 
based  problems  as  a  final  result  aim  at  a  certain  organization  of  categories  or 
classification  of  the  patterns  into  some  distinct  groups.  Categories  are  usually  labeled  as 
binary  values  and  the  training  procedure  is  based  on  the  correct  classification  for  each 
training  pattern,  synaptic  weights  and  biases  of  the  network  are  adjusted  to  accomplish 
this  task. 

The  approximation-based  problem  has  the  objective  to  map  the  input  space  onto 
the  output  using  a  finite  set  of  data  points  and  then  to  be  able  to  get  an  output  value  for 
any  datum  from  the  input  space  as  close  to  the  correct  value  as  possible.  For  this 
formulation  outputs  are  usually  analog  and  their  exact  values  are  important.  Network 
parameters  are  modified  during  the  training  to  minimize  some  energy  function  that 
depends  on  the  difference  of  the  actual  and  the  desired  network  output. 

Both  formulations  do  not  have  significant  differences  in  their  algorithms  and 
implementation,  but  certain  details  are  changed  to  achieve  the  best  performance  for  each 
task.  In  this  work  the  first  formulation  is  used  for  subsurface  pattern  identification,  the 
second  one  for  identification  of  the  startigraphic  layers  depths. 

3. 1  MultiLayer  Perceptron  (MLP) 

Multilayer  perceptron  is  probably  one  the  most  well-known  neural  networks,  that 
are  trained  with  the  supervised  algorithms.  It  belongs  to  the  class  of  feedforward  layered 
neural  networks  -  it  has  an  input  layer,  output  layer  and  one  or  several  internal  or  hidden 
layers.  Connections  in  this  type  of  network  exist  only  between  the  neurons  of  different 
layers  and  each  neuron  is  connected  to  all  the  neurons  in  the  previous  and  in  the 
following  layers.  There  are  no  lateral  (within  one  layer)  connections  or  feedback.  The 
input  signal  propagates  through  the  network  in  a  forward  direction  on  a  layer-by-layer 
basis. 
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~~  MLPs  found  their  application  in 


Figare  3-1.  MLP  network. 


different  technological  and  other  areas. 
Each  processing  unit  (neuron)  of  the 
MLP  network  can  be  described  with  an 
activation  or  transfer  function,  the 
essential  feature  of  that  function  is  non¬ 
linearity  and  for  many  applications  it 
may  similar  to  the  Equations  3. 1  or  3 .2: 

Ci 

/(“/)“  l  +  eXp{_Mj, 

(3.1) 


/(«,)  = 


H,  +C2  ’ 


(3.2) 


where  ci,  C2  are  constants  and  hj  is  the  output  of  the  neuron  before  applying  the  non¬ 


linearity: 


(3.3) 


where  Xj  is  the  network  input  or  the  output  of  the  previous  layer,  w  and  9  are  the  weights 
and  the  bias  corresponding  to  this  neuron  respectively.  Input  propagation  through  the 
network  is  a  forward  run,  finally  it  produces  a  set  of  outputs  -  activation  signals  of  the 
output  layer  units.  During  the  forward  run  all  the  weight  and  biases  values  remain  fixed. 
During  the  backward  run  weights  and  biases  are  adjusted  according  to  the  error- 
correction  learning  rule.  The  error  is  defined  as  a  sum  of  squared  differences  of  the 
activation  values  of  output  neurons  and  the  target  output  values  provided  by  the  training 
set  The  error  propagates  backward  through  the  network  and  the  weights  and  biases  are 
adjusted  to  reduce  the  cumulative  error.  This  algorithm  is  a  “Backpropagation  of  error’ 
learning  algorithm. 
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'  in  this  work  an  MLP  implementation  with  only  one  hidden  layer  is  considered, 
there  are  two  fact,  that  proves  fee  sufficiency  of  this  assumption.  The  first  h,  a 
by  A.N.  Kolmogorov  [14]  ttat  states  that  a  two-layer  (actually  three  rnput-h.  en-ou» 
network  is  abie  to  perform  mapping  from  an  arbitral  input  space  onto  to arbtb^ 
output  space,  the  only  detail  tot  is  missing  is  to  number  of  neurons  m  to  hrdden  layer, 
but  there  exist  a  configumtion  which  allow  to  perform  a  desired  conec.  tnaPTO 
Another  argurnem  in  support  of  to  assumption  above  is  tot  a  considemble  ma,or  of  to 
current  MLP  applicmions  are  accomplished  with  only  one  hidden  layer  and  acceptable 

results  are  reported  in  many  cases. 

3.2  Backpropagation  learning  algorithm 

As  far  to  derivation  of  to  Baekpropagation  algorithm  is  presented  in  most  of  to 
neural  network  books  [5],  [6]  only  to  brief  summary  is  given  here  and  to  discussron  .s 
focused  on  to  dentil,  of  implementation,  modifications  of  to  algorithm  mid  to  modes 

of  operation.  .  .  . 

Let  (x.  yl)  be  a  single  pattern  in  to  training  set,  M  is  to  number  of  trammg 

patterns.  For  a  single  neuron  winch  output  <e)  and  activation  (a-ffni)  functions  ere 

defined  mi  Equations  3.3uhd  3.2  (3.2  is  used  instead  of  3.1  to  allow  to  mttivation  to  be  m 

to  range  [-1.0,1 .0]).  For  to  input  layer  fl,  (0)  ax,,  for  the  output  layer  a,(L)-y,,L  is 

to  total  number  of  layers.  Nr  number  of  neurons  in  a  layer. 

The  objective  is  to  modify  weights  w,  and  biases  fl  to  minimize  to  error  energy 

function: 

*  <3'4) 
2  m  i 

During  to  backward  run  weight  ratines  (and  biases  also,  but  later  on  only  to 
expression  for  weights  are  presented)  are  updated: 

0+A<(/)  (3*5) 

where  the  superscript  indexes  denote  the  discrete  time  step  or  the  iteration  and 
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A<(0  =  1) ,  (3-6) 

<37> 

where  ^  is  the  error  signal,  rj  is  the  learning  rate  term,  and /’  is  the  derivative  of  the 
activation  function  with  respect  to  a  single  weight. 

The  keys  to  Backpropagation  algorithm  are  the  recursive  formulas: 

=  + \)f(u';n(i + i))H'<ri(/ + 1)  (3.8) 

x’lF’O)  =  +  (3.9) 

Using  this  formulas  error  signal  may  be  calculated  recursively  for  any  neuron  and 
its  weights  and  biases  may  be  updated  accordingly.  The  only  modification  for  the 

standard  Backpropagation  is  the  momentum  term  -  (/) .  Using  the  momentum 

term  implies  adaptation  of  the  learning  step  size  with  respect  to  the  weight  change  on  the 
previous  iteration:  so  flat  spots  on  the  energy  surface  are  traversed  faster,  and  the  step 
decreases  for  the  rough  spots. 

The  Backpropagation  algorithm  described  above  refers  to  the  on-line  mode  when 
the  weights  are  updated  after  presenting  each  single  pattern.  Batch  mode  has  the 
difference  that  the  weight  are  updated  once  in  an  epoch  (a  full  presentation  of  all  training 
patterns).  For  this  mode  all  weights  and  biases  changes  are  summed  over  one  epoch. 

In  this  work  on-line  mode  of  the  Backpropagation  algorithm  was  implemented 
(sometimes  it  is  also  called  “vanilla  backpropagation”  [18]).  But  the  supervised  learning 
of  the  feedforward  networks  is  not  limited  to  the  discussed  methods.  As  far  as  the  task  is 
to  minimize  the  error  energy  function,  and  the  techniques  above  use  only  first  order 
Gradient  Descent  method.  So  the  task  may  be  formulated  as  an  unconstrained  nonlinear 
function  optimization  problem  [5],  The  natural  step  forward  is  to  use  a  second  order  or 
other  modern  procedures  (like  Conjugate  Gradient)  to  accomplish  the  task  of  finding  the 
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acceptable  minimum  on  the  error  surface.  This  extension  was  made  in  the  next  section, 
where  the  Scaled  Conjugate  Gradient  algorithm  is  discussed. 

3.3  Scaled  Conjugate  Gradient  method 

Scaled  Conjugate  Gradient  (SCG)  algorithm  for  fast  BP  training  was  developed 
by  M.  Moller  [7]  in  1991  and  is  one  of  the  most  efficient  in  its  class. 

SCG  belongs  to  the  class  of  Conjugate  Gradient  methods  and  has  the  major 
advantage  in  the  speed  of  convergence  and  the  lack  of  the  parameters  to  be  defined  by  the 
user  before  the  training  procedure.  Most  of  the  algorithms  require  at  least  the  initial 
learning  rate  value,  momentum  term,  etc.  SCG  behaves  in  a  similar  way  as  the  ordinary 
CG  [4], [5],  but  has  a  built-in  feature  that  allow  to  determine  a  step  size  (learning  rate  for 
the  particular  iteration)  in  an  automatic  manner  at  a  relatively  low  computational  cost. 
This  technique  is  applicable  to  the  batch  mode  of  the  BackPropagation  algorithm. 

The  SCG  is  applied  to  the  generalized  weight  vector  consisting  of  all  the  weights 
and  biases  of  the  network  (in  this  work  we  consider  only  a  two-layer  network): 

where  w  denote  the  weights  values,  8  denote  biases  and  superscript  indexes  refer  to  the 
layer  number,  subscript  ones  to  the  number  of  element  in  the  layers,  total  number  of 
components  in  this  vector  is  N. 

Network  energy  function  is  defined  as  the  sum  of  error  squares  (Equation  3.4)  and 


the  derivative  E'(w)  = 


f  p  dE„  X  dE„  dEp  p  dEp 
d0)  ’ 


^,1 

p= l  atfj  p^\  f 


V  p=  i  dWy 


2  ?■* 


%d0l  , 

where  P  is  the  total  number  of  patterns  in  the  training  set  and  p  refers  to  the  Ep  for  the 
particular  pattern.  The  main  feature  of  the  SCG  is  the  approximation  of  the  second  order 
information: 

~  r.Y~  ,  ~  E'(wk+cxkpk)-E'(wk ) 
sk=E"{wk)-pk  * - - - , 

where  pk  is  the  conjugate  direction,  which  is  the  key  to  compute  the  step  size  ak  for  the 
each  component  of  the  weight  vector  updating  -  wi+1  =  wk+  cckpk .  Another  critical 
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feature  of  this  algorithm  is  making  the  Hessian  matrix  to  be  always  positive  definite  by 

adjusting  the  scalar  parameter  Xk  in  the  expression  below: 

~  E'(wk+akpk)-E'(wk) 

sk  = - +  Xkpk  . 

The  SCG  implementation  may  be  summarized  as  the  following: 

1.  Parameters  initialization:  0  <  a  <  10*4,  0  <  <  10"6,  ei  =  10‘8,  82  =  10  10  -  these 

numbers  are  mostly  determined  by  the  precision  of  the  machine,  Xx  =  0 ,  and  the 
initial  weight  vector  w,  is  chosen  (random  values  are  assigned  to  the  weights  and 
biases).  Gradient  direction  px  =  r  =  -£'(w,)  is  calculated,  variable  SUCCESS  set  to 

be  equal  “TRUE”. 

2.  Second  order  information  is  calculated, 

<*k  =<r/\Pk[, 

sk  =  (E'(wk  +  (Jkpk)~  E'(wk ))  /  (7k ; 

Sk=Pl\~, 

3.  Sk  is  scaled  as  Sk=5k+(Xk-Xk )| \pk f . 

4.  If  dk  <  0  then  the  Hessian  matrix  is  made  to  be  a  positive  definite: 

K  =2(4-4 /|ftf); 

4  =-4+4|a|2; 

•4  ~  4  ■ 

5.  Step  size  is  calculated: 

Fk  =  PlK ; 

ak=Mk  !Sk. 

6.  Then  the  comparison  parameter  A  is  calculated: 

A,  =  2Sk[E(wk)  -  E(wk  +  akpk)\ /  p\ . 

7.  If  A*  >  0,  then  the  error  can  be  reduced: 

=™k+  <xkpk ; 
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Xk  -  0 ,  SUCCESS  =  TRUE; 

lf)he  total  number  of  itetoions  so  to  has  reached  N  -  the  algorithm  is  restarted: 
else: 

a«-^+aa- 

If  At  ^  0.75  then  the  scale  parameter  is  reduced  A*  =  4  K  • 

If  A*  <0: 

^  =  4; 

SUCCESS  *  FALSE. 

8.  If  A*  <  0.25  then  the  scale  parameter  is  increased: 

9.  If  the  steepest  descent  direction  r,  *5,  then  *  -  k+1  and  go  to  the  step  2, 
otherwise  the  algorithm  is  terminated  and  the  desired  generalized  weight  vector 

'  is  the  final  output  AmStlter  termination  criterion  which  proved  to  be  valuable  to  avoid 

floating-point  exceptions  is  intmduced  in  [8]  and  implemented  in  this  work: 

2  lifif,,,)  -£(@,)|£  <o(|E(®.,i)| + + A) 

The  SCO  was  not  benchmarked  against  the  regular  Backpropagation  in  this  work 
due  to  the  lack  of  really  large  scale  problems.  For  the  details  on  comparison  with  the 
different  techniques  one  may  refer  to  [7]. 
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4.  PRE-PROCESSING  FOR  FEATURE  EXTRACTION 

For  the  many  signal/data  processing  architectures  a  proper  choice  of  the  pre¬ 
processing  technique  is  one  of  the  most  significant  factors,  which  determines  the 
performance  of  the  entire  system  [4],  Several  pre-processing  procedures  address  the 
following  goals. 

•  to  eliminate  inconsistent  data  from  regular  consideration; 

•  to  eliminate  the  redundant  information  thus  reducing  the  dimension  of  the  problem; 

•  to  enhance  the  most  discriminative  features  of  the  different  data  categories; 

•  to  scale  or  transform  the  data  to  allow  the  main  processing  system  to  operate  in  the 
most  favorable  regime. 

From  now  on  the  discussion  addresses  the  default  GPR  setup:  a  600  ns  A-scan 
recorded  with  a  center  frequency  of  100  MHz,  and  the  term  “soil”  refers  mostly  to  sands 
and  gravels,  unless  stated  otherwise. 

4.1  Correlation  based  pre-processing  for  checking  data 
consistency 

The  goal  of  this  technique  is  to  determine  inconsistent  A-scans  in  order  to 
exclude  them  from  regular  processing.  Inconsistency  means  that  the  current  A-scan 
does  not  fit  the  “trend”  or  appears  to  be  out  of  the  scan  sequence.  To  accomplish  this 
goal  a  cross-correlation  operation  is  performed  for  every  two  adjacent  scans  and  the 

space  derivative  of  the  correlation  value  is  calculated.  If  »  0  -  then  an  alarm 

signal  is  sent  and  the  current  scan  is  marked  as  suspect  (T  is  the  cross-correlation  value, 
r  is  the  distance  along  the  radar  line). 

This  operation  is  performed  in  the  Fourier  domain  to  achieve  shift  invariance. 
Small  changes  in  the  scan  starting  point,  which  could  be  the  result  of  some  bump  on  the 
road  where  the  antenna  is  being  dragged,  may  cause  significant  differences  in  T  values 
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in  the  time  domain,  while  the  Fourier  spectra  of  the  adjacent  scans  remain  almost  the 
same. 


Nl  2 

Here  T  =  '^JFjFj+l ,  where  F  is  the  magnitude  of  the  corresponding  Fourier 

i=l 


coefficient,  N  is  the  number  of  sampling  points  in  the  scan.  Figure  4-1  shows  T  and 


values  for  a  sample  A-scan  sequence. 


Figure  4-1.  Cross-correlation  and  its  derivative  relative  values  vs.  scan  number. 

4.2  Dimension  reduction  and  discriminative  features 
extraction 

The  main  preprocessing  stage  of  the  data  passed  through  the  consistency 
checking  routine  addresses  the  following  issues. 

•  reducing  the  dimension  of  the  problem  -  a  typical  A-scan  is  represented  with  512  - 
1024  sampling  points  -  too  many  for  using  as  inputs  for  the  neural; 

•  preserving  the  chronological  order  of  reflection  pulses  contained  in  the  A-scan; 
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•  enhancing  those  features  in  the  signal  that  reveal  the  most  discriminative 
characteristics  of  the  data  which  allow  one  to  distinguish  between  different  subsurface 
patterns; 

•  incorporating  available  prior  knowledge  about  possible  feature  combinations 
imposing  certain  physical  constraints  (s  values,  layering  order). 

The  dimensional  issue  is  very  important  in  terms  of  the  size  of  the  data  set  for 
training  the  neural  network:  higher  dimension  produces  larger  neural  network.  When 
the  input  vector  dimension  reaches  the  value  of  N  =  40  -  60,  it  becomes  extremely 
difficult  either  to  generate  sufficient  amount  of  synthetic  data.  The  dependency  may  be 
expressed  as  follows:  the  required  number  of  training  patterns  T  should  be  much  larger 
(5-10  times)  than  (NM  +  MO)  -  the  “total  number  of  weights  in  the  network”  [20], 
Here  M  is  the  number  of  neurons  in  the  hidden  layer,  and  0  is  the  number  of  output 
neurons.  For  this  particular  problem  M  is  often  chosen  to  be  about  10.  That  is  why  N  of 
about  50  produces  Ton  the  order  of  103  -  104  making  the  approach  infeasible  in  terms  of 
generation  of  the  required  number  of  training  examples 

4.3  Fourier  Transform 

Application  of  a  Discrete  Fourier  Transform  (DFT)  to  an  A-scan  takes 
advantage  of  the  shift  invariance  property  of  the  FT  component  magnitudes  (discarding 
the  phase)  and  thus  representing  the  scan  by  a  feature  vector  consisting  of  the 
magnitude  values  in  the  frequency  domain  alone.  The  general  difficulty  is  that  this 
procedure  does  not  produce  a  unique  mapping  of  the  scan  into  the  frequency  domain, 
and  two  different  scans  may  turn  out  to  have  very  similar  Fourier  spectra. 

Fourier  spectra  (256-point  FT)  of  four  real  GPR  A-scans  are  shown  in  Figure  4- 
2  (left)  and  one  may  see  the  apparent  distinct  features  of  each  spectrum.  However, 
feature  vectors  constructed  from  the  FT  magnitudes  did  not  provide  an  accurate  pattern 
recognition  (the  data  were  tested  with  the  algorithm  described  in  the  Section  5.3). 
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Figure  4-2.  Fourier  Transform  (left)  for  four  A-scans  (right). 

The  most  serious  argument  against  the  Fourier  transform  based  approach  is  the 
similarity  of  the  Fourier  spectra  for  the  simulated  A-scans  with  different  reflection 
patterns  as  seen  in  Figure  4-3 . 


40.0  140.0  240.0  340.0  440.0  540.0 


Figure  4-3.  Simulated  A-scans  and  their  Fourier  spectra. 
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4.4  Wavelet  Transform 

Wavelet  transform  (WT)  [12]  based  preprocessing  finds  extensive  application  in 
different  pattern  recognition  tasks  [10]  and  some  models  of  wavelet  neural  networks 
were  proposed  and  tested  [11],  WT  has  several  advantageous  features  over  the  Fourier 
transform  in  a  sense  that  WT  represents  the  signal  preserving  the  local  neighborhood 
relations.  WT  is  more  flexible  in  terms  of  choosing  the  wavelet  basis  functions  (Figure 
4-4)  for  better  data  representation. 


Figure  4-4.  Different  types  of  WT  basis:  Haar,  Daubeshies  4,  Daubeshies  16. 

A  conventional  Wavelet  Transform  is  a  decomposition  of  a  signal  into  a  set  of 
orthogonal  basis  functions  performed  in  a  certain  order. 

N 

AO  =  E^*(0  (4.i) 

/=1 

where  A(t)  is  the  A-scan  in  the  time  domain,  w  is  the  weight  of  the  WT  decomposition, 
<p  is  the  corresponding  basis  function,  and  k  may  be  considered  as  a  discretization  level. 
The  shift  and  dilation  values  are  usually  fixed  and  on  every  next  step  of  decomposition 
are  twice  as  small  as  on  the  previous  one.  Typical  A-scan  WT  decomposition  and 
consequent  restoration  as  a  function  of  the  number  of  WT  coefficients  is  shown  in 
Figure  4-5. 
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Figure  4-5.  Restoration  of  an  a-scan  with  fewer  WT  coefficients. 

In  principle  the  feature  vector  for  the  neural  network  input  may  be  constructed 
from  the  WT  components  with  the  largest  magnitudes,  but  the  difficulty  in  this  case  lies 
in  the  inability  to  predict  where  (among  512  coefficients)  those  large  magnitudes  can 
appear  for  an  arbitrary  scan.  That  is  why  the  idea  of  sampling  certain  groups  of 
coefficients  came  into  play.  This  technique  was  implemented  and  groups  of  the  above 
mentioned  WT  coefficients  were  sampled  for  different  radar  lines.  Moreover,  one  could 
observe  the  difference  between  the  final  distributions  of  the  sampled  values  of  those 
groups,  which  were  appropriate  to  be  used  to  distinguish  one  pattern  from  another. 
Unfortunately,  after  processing  a  data  set  of  feature  vectors  constructed  as  described 
above  no  acceptable  pattern  recognition  results  were  obtained. 

The  most  probable  reason  was  that  the  approach  had  a  serious  disadvantage  due 
to  the  fixed  values  of  shifts  and  dilations  of  this  transform  -  different  number  and 
locations  of  the  WT  coefficients  could  occur  for  the  slightly  shifted  in  time  refection 
pattern.  Thus,  for  some  situations  WT  could  produce  relatively  large  amount  of 
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redundant  data  increasing  the  dimension  of  the  feature  vector  and  ceasing  to  pick  the 
most  discriminatory  features  of  the  GPR  data. 

The  next  logical  step  in  improving  the  transform  performance  was  to  allow 
arbitrary  shifts  and  dilations,  but  this  lead  to  the  loss  of  basis  functions  orthogonality 
and  prevented  the  application  of  the  regular  WT  technique  for  the  signal  decomposition 
procedure.  One  alternative  was  the  Adaptive  Wavelet  Transform  [10]  that  gave  more 
freedom  to  shift  and  dilation  values.  But  the  idea  of  using  as  much  prior  information 
about  the  data  as  possible  lead  one  to  reject  the  wavelet  basis  and  instead  choose  the 
best  for  this  specific  problem  transform  basis,  which  turned  out  to  be  the  initial  pulse 
used  in  GPR  data  collection. 

All  those  issues  were  major  reasons  for  developing  a  technique  similar  to  WT 
for  data  adaptive  signal  decomposition  called  an  Adaptive  Transform. 

Recall  that  the  discussion  addresses  the  default  GPR  setup:  a  600  ns  A-scan 
recorded  with  a  center  frequency  of  100  MHz,  and  the  term  “soil”  refers  mostly  to  sands 
and  gravels,  unless  stated  otherwise. 


4.5  Adaptive  Transform 

Adaptive  transform  (AT)  decomposes  the  A-scan  F(t)  into  a  set  of  initial  pulses: 


G(t )  -  X(  wi h (  L)  +  wf  h (  L)+- •  -+wi^i (  ~  ‘  )  .  , 


where  K  is  the  total  number  of  functions  in  the  decomposition,  w,  are  the  weight  values, 
T(t)  -  initial  pulse  (IP),  c,  are  the  shifts,  and  s,  are  the  dilation  values.  Different  values 
for  s  are  due  to  the  dispersion  effect  and  every  next  is  greater  than  the  previous  one  (sj  = 
1.0).  According  to  numerical  experiments  for  realistic  GPR  scan  times  (500  -  1000  ns) 
and  the  actual  subsurface  media  the  dispersion  is  not  greater  than  1.2.  Thus,  as  few  as 
three  or  four  different  dilation  values  may  be  used  for  the  decomposition  (i.e.  s  =  1.0, 
1.1,  1.2).  For  the  case  when  we  do  not  account  for  dispersion  (see  section  2.3  for  the 
discussion  about  the  applicability  of  the  non-dispersive  model)  we  end  up  with  a 
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simplified  decomposition  since  s  =  1.0  and  a  weight  vector  with  the  total  number  of 
components  equal  to  the  number  of  sampling  points  in  the  signal  -  K. 

G(0  =  Iw,J,(^VL)  (4.3) 

J=1  S 

The  residual  \F(t)-G(t)\  doesn’t  have  to  be  close  to  zero  (some  of  the  details  or 
noise  may  be  disregarded  due  to  the  insignificant  influence).  There  is  no  need  to 
reconstruct  the  scan  back  from  the  AT.  Only  the  transform  coefficients  (w,  c,  s )  are  used 
for  the  data  processing.  Basis  functions  of  the  AT  are  not  orthogonal,  that  is  why 
decomposition  cannot  be  done  in  a  way  similar  to  Fourier  or  Wavelet  Transform,  but  it 
can  be  performed  with  the  help  of  the  following  algorithms. 


4.5.1  Correlation-based  method  (Algorithm  1) 


Algorithm  I: 

1 .  find  the  location  tj  of  the  largest  correlation  or  anticorrelation  of  the  IP  and  the  A- 
scan: 


POINTS  Jn  IP 

L 

k 


(4.4) 


ifiNF(tty 

2.  determine  the  sign  and  the  weight  value  for  a  single  basis  function  at  this  location: 

05) 


3 .  subtract  weighted  IP  at  tj  from  the  original  A-scan  F\(t)  =  F(t)  -  Wj  I/t ); 

4.  go  to  step  1  for  Fj(t); 

Algorithm  1  in  its  pure  form  is  ideal  for  data  with  relatively  low  noise  level  and 
a  low  degree  of  possible  overlapping  of  the  reflection  pulses.  The  first  problem  refers  to 
Equation  (4.4)  -  high  noise  may  result  in  the  largest  correlation  value  to  be  shifted  from 
the  actual  position,  where  it  should  occur,  or  in  the  change  of  the  sign  for  the  correlation 
value.  Due  to  the  single  carrier  frequency,  the  high  positive  correlation  in  the  presence 
of  noise  may  be  misrecognized  as  a  1/2  period  shifted  negative  correlation  (in  a  similar 
sense  as  the  sin(x)  may  be  considered  to  be  a  shifted  -sin(x)).  The  second  issue 
introduces  some  error  into  the  weight  value  determined  by  Equation  (4.5)  -  overlapping 
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may  cause  certain  changes  in  the  integral  value  in  the  numerator:  the  larger  is  the  degree 
of  overlapping,  the  less  accurate  the  weight  value. 

These  two  problems  impose  constraints  onto  the  applicability  of  this  algorithm 
for  synthetic  as  well  as  real  data.  Noise  issue  is  absent  in  the  synthetic  data,  but  the 
Noise  Ratio  (NR)  for  real  data  may  vary  from  a  few  percent  at  the  beginning  of  the  scan 
to  100-200%  at  the  last  half  of  it.  The  most  of  the  reliable  GPR  information  is  contained 
in  the  first  half  of  the  scan  (reflections  from  the  top  of  permafrost  and  water  table),  and 
upon  thorough  investigation  of  the  underlying  physics  of  the  EM  wave  propagation 
through  the  soils  some  conclusions  on  the  noise  nature  and  its  statistical  distribution  can 
be  made  and  some  elaborate  filtering  techniques  may  be  applied.  Overlapping  problems 
impose  limitations  onto  the  minimum  spacing  between  the  subsurface  layers  that  the 
algorithm  can  resolve  with  high  accuracy.  The  estimation  for  this  minimum  distance  for 
the  least  possible  s  value  of  about  5  for  the  permafrost  is  about  1.3  meters  for  the  IP  on 
the  order  of  30  ns  for  a  100  MHz  frequency.  The  constraints  and  limitations,  that  were 
discussed  above,  do  not  invalidate  the  AT  concept  study  and  the  entire  system 
performance  evaluation,  since  even  without  the  effective  filtering  technique  the  neural 
network  processing  part  has  a  remarkably  high  noise  and  error  tolerance  (see  Chapter 
5).  Moreover,  the  alternative  method  to  determine  the  weight  values  (though  at  a  higher 
computational  cost),  which  is  discussed  in  the  next  section,  does  not  seem  to  be  superior 
to  the  correlation-based  method  at  least  according  to  the  experiments  performed. 

4.5.2  Optimization-based  method  (Algorithm  2) 

Algorithm  2  is  based  on  a  common  optimization  problem:  we  can  define  an  error 
function  for  the  F(t)  -  the  actual  scan  and  G(t)  -  the  reconstructed  signal  from  the  AT 
coefficients: 

E  =  't(G(t,)-F(tl)f  (4.6) 

7=1 

over  the  K  sampling  points  of  the  signal.  Then,  using  one  of  the  modem  optimization 
techniques  for  E  in  Equation  (4.6)  as  a  cost  function  to  be  minimized  over  the  set  of 
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variable  parameters  {wi,..,wk},  we  can  obtain  a  more  accurate  decomposition  of  the 
scan  signal.  Logical  choice  for  the  optimization  technique  is  a  Conjugate  Gradient  (CG) 
method  [13],  For  faster  convergence  a  decomposition  generated  with  Algorithm  1  may 
be  used  as  an  “initial  guess”.  The  major  drawback  of  this  algorithm,  even  with  optimal 
implementation  in  terms  of  CPU  time,  is  inability  to  use  this  method  for  possible  real¬ 
time  processing  system. 

The  numerical  experiments  with  elaborate  CG  software  developed  by  Sergey 
Perepelitsa  [22]  showed,  that  even  for  a  significant  degree  of  overlapping  (more  than 
half  of  the  reflection  pulse),  the  decomposition  obtained  from  Algorithm  1  cannot  be 
improved  in  terms  of  adjusting  the  weight  values  with  the  fixed  shifts  more  than  by  a 
fraction  of  one  percent.  This  can  be  most  likely  explained  by  the  fact  that  AT  basis 
function  has  a  length  of  30  -  65  sample  points  (the  entire  signal  is  512  sample  points), 
and  changing  of  one  weight  coefficient  affects  the  restored  scan  not  only  in  the  region 
of  the  CG  current  improvement,  but  also  in  the  adjacent  region.  Thus,  very  likely,  this 
problem  requires  some  stochastic  optimization  methods,  but  those  methods  are  not 
considered  in  this  work  due  to  extremely  high  computational  complexity,  so  that  real¬ 
time  implementation  becomes  infeasible. 

4.5.3  Benefits  of  the  AT 

This  approach  has  the  benefit  of  very  low  dimensional  signal  representation  due 
to  the  necessity  of  having  only  one  T(t)  to  represent  a  single  reflection  and  the  property 
of  keeping  almost  all  the  possible  information  about  the  reflection  profile:  w  is  the 
magnitude  and  inversion/no  inversion  information  of  the  reflection  coefficient,  c  is  the 
location  in  time,  5  is  the  relative  dispersion.  The  proposed  technique  proved  to  be 
accurate  for  the  simulated  plane-wave  formulation  data  even  with  thin  layers  (that 
produced  overlapping  reflection  pulses  within  the  limits  discussed  in  the  Section  4.5.2) 
and  for  the  simulation  data  with  artificially  introduced  uniform  random  noise  up  to  10% 
of  the  average  over  nonzero  components  of  the  initial  A-scan  (see  Figure  4-6).  The  only 
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Figure  4-6.  Original  (above)  and  noisy  (below)  synthetic  scans. 


Important  issue  is  that  the  initial  pulse  may  be  represented  not  necessarily  as  an 
analytical  function  but  may  as  well  be  a  numerically  represented  impulse.  This  makes 
the  AT  approach  applicable  either  for  synthetic  or  for  real  GPR  data  as  far  as  in  the 
latter  case  the  IP  may  be  determined  through  identifying  especially  clear  reflection 
pulse  and  scaling  it  to  the  appropriate  amplitude  (see  Section  4.6.2). 

4.6  Adaptive  Transform  for  synthetic  and  real  data 
4.6.1  Performance  for  simulated  data 

The  Adaptive  Transform  technique  was  successfully  applied  to  synthetic  data: 
Figure  4-7  shows  relative  magnitudes  w  of  the  weights  in  the  corresponding  locations. 
This  demonstrated  the  capability  of  the  technique  to  distinguish  not  only  individual 
reflected  pulses  but  also  superimposed  ones  -  wj  and  w?  are  the  two  reflections  from  a 
very  close  interfaces  (1.5  meters  thick  layer  with  s  =  5.0).  The  initial  pulse  of  the  form 
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IP(t)  =  -cos(6^3)cos(ft>?)  (4.7) 

was  used  in  the  FDTD  simulator  and  in  the  AT. 


Figure  4-7.  Adaptive  Transform  weights  and  shifts. 

The  AT  performance  results  for  synthetic  scans  are  neither  very  interesting  nor 
important  because  any  synthetic  scan  can  be  constructed  analytically  from  the  reflected 
initial  pulses.  With  the  known  geometry  and  known  dielectric  constants  the  scan  can  be 
represented  as  a  superposition  of  reflections  from  the  corresponding  layers.  Really 
important  results  that  prove  the  applicability  of  the  Adaptive  Transform  are  for  the 
decomposition  of  the  real  GPR  A-scans,  which  are  discussed  in  the  nest  section. 
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4.6.2  Performance  for  real  data 

In  order  to  approximate  the  Initial  Pulse  a  relatively  clean  reflection  from  the 

radar  line  CR94-61r  was 
extracted.  Mathematica  3.0 
package  was  used  to  approximate 
the  IP  by  the  trigonometric 
fitting  routine  -  up  to  20  terms 
were  used  to  get  an  accurate 
representation.  Figure  4-8  shows 
a  plot  of  this  reflection.  The 
scaled  version  could  be 
considered  as  a  real  Initial  Pulse 
used  in  GPR  measurements.  Also 
this  representation  of  the  real  IP  is  suitable  for  embedding  into  the  FDTD  simulation 
model. 

Application  of  the  Adaptive  Transform  to  real  GPR  data  was  verified  using  the 
scans  from  radar  lines  CR93-11  and  CR94-61r.  Different  initial  pulses  were  extracted 
from  relatively  clean  reflections  due  to  the  fact  that  the  lines  have  different  duration  of 
the  scans  (600  and  1000  ns)  and  different  number  of  sampling  points  (512  and  1024). 

Adaptive  decomposition  was  performed  with  the  correlation  based  algorithm 
and  the  scans  were  restored  from  the  AT  shift  and  weight  coefficients  to  check  the 
accuracy  of  the  technique.  For  both  cases  fewer  than  10  Adaptive  Transform  basis 
functions  were  used.  Figures  4-9  and  4-10  shows  the  original  and  the  restored  scans.  A 
good  agreement  is  observed,  major  reflections  are  fairly  accurate.  Another  remarkable 
property  of  the  Adaptive  Transform  is  demonstrated:  noise  in  the  signal  is  almost 
completely  ignored. 


Figure  4-8.  Real  Initial  Pulse. 
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Figure  4-10.  Original  (left)  and  restored  from  AT  (right)  scan  from  CR93-11. 
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4.7  Feature  vector  from  Adaptive  Transform 

A  low  dimensional  feature  vector  was  constructed  from  the  AT  coefficients  and 
used  as  an  input  for  the  neural  network  processing  units.  Due  to  the  different 
formulations  (classification  and  approximation)  of  NN1  and  NN2,  different  features  of 
the  set  of  AT  coefficients  were  chosen  for  data  representation. 

For  the  classification  oriented  neural  network  the  first  10  largest  in  magnitude 
weights  Wj  in  chronological  order  were  chosen,  then  the  same  weights  were  arranged  in 
descending  order  of  their  magnitude  and  appended  to  the  chronological  set.  Shift  values 
were  not  used  directly,  only  the  order  of  the  particular  weight  appearance  had 
importance. 

The  implications  for  such  a  choice  of  the  feature  vector  are  the  following:  1)  the 
layers  of  the  single  subsurface  pattern  may  be  located  at  different  depths,  2)  particular 
time  of  the  reflection  is  not  useful,  but  the  chronological  order  of  the  weights  and  their 
values  that  contain  the  information  about  the  sign  of  the  reflection  (whether  it  is 
inverted  or  not)  is  useful  and  3)  its  strength  certainly  characterize  some  subsurface 
layers  configuration.  The  same  weights  in  the  order  of  magnitude  provide  some 
additional  information  that  may  become  necessary  when,  for  example,  a  wrong,  but  not 
very  large  coefficient  is  identified  by  the  AT.  In  this  case  the  chronological  order  of 
weights  is  disturbed,  but  the  magnitude  order  remains  the  same  allowing  to  perform 
correct  pattern  recognition. 

The  approximation  oriented  NN2  does  not  require  the  weight  values  for  the 
input  since  the  subsurface  pattern  is  already  identified  and  the  number  and  order  of 
layers  is  known.  Now,  the  exact  shift  values  contain  the  essential  information  about  the 
location  of  the  particular  reflections  and  10  of  them  that  correspond  to  the  first  10 
weights  exceeding  a  certain  threshold  value  are  used  to  construct  the  feature  vector. 

The  construction  of  the  feature  vector  from  Adaptive  Transform  coefficients  is 
not  unique.  However,  the  training  sets  generated  using  the  above  techniques  applied  to 
NN1  and  NN2,  demonstrated  very  good  performance  (see  Sections  5.3  and  5.4). 
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4.8  Compensation  for  variable  gain  in  real  system 

Another  issue,  which  may  be  considered  as  a  pre-processing  stage  is  the 
appropriate  normalization  of  the  A-scans.  One  of  the  ways  to  normalize  the  scan,  which 
is  to  divide  all  the  values  by  the  maximum  amplitude  value  among  all  the  reflections, 
was  not  successful.  In  this  case  the  strongest  reflections  for  each  of  the  two  different  A- 
scans  will  have  the  same  unit  value.  Another  way,  which  also  does  not  work  well,  is  the 
normalization  by  the  average  over  all  points  of  the  scan  value  -  this  approach  fails  due 
to  the  different  total  energy  of  reflections  for  different  scans. 

Our  most  successful  technique  to  normalize  the  A-scan  by  dividing  all  the 
components  by  the  maximum  amplitude  of  the  initial  pulse  radiated  by  the  GPR 
antenna.  In  this  case  all  reflections  will  be  scaled  appropriately  -  smaller  antenna  pulses 
produce  smaller  reflections,  and  the  relative  magnitudes  are  preserved.  Unfortunately, 
the  information  about  the  initial  antenna  pulse  amplitude  was  not  available  at  the  time 
this  work  was  done.  That  is  why  the  problem  of  implementation  of  the  correct 
normalization  technique  remains  open. 
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5.  HIERARCHICAL  GPR  DATA  PROCESSING  SYSTEM 

The  governing  idea  of  the  hierarchical  architecture  is  to  split  the  task  of 
startigraphic  layers  profiling  into  consecutive  stages,  where  each  subsequent  stage  has 
less  degree  of  uncertainty  than  the  previous  one.  The  degree  of  uncertainty  is  interpreted 
in  two  ways,  it  becomes  larger: 

•  if  the  number  of  the  different  possible  outputs  for  an  arbitrary  input; 

•  if  the  dimension  of  the  output  signal  of  a  current  processing  stage  becomes  larger,  it 
makes  more  difficult  to  visualize  and  interpret  the  results; 

The  second  issue  is  almost  as  important  as  the  first  one,  especially  during  the 
development  and  testing  period:  the  ability  of  the  fast  and  revealing  intermediate  result 
evaluation  allows  to  make  the  necessary  changes  in  the  system  in  case  of  bad  datum  or 
another  system  confusion. 

The  entire  GPR  signal  processing  is  divided  into  four  pre-processing  and  two 
processing  stages.  Neural  network  units  are  chosen  to  be  the  main  processing  blocks  due 
to  the  following  remarkable  properties: 

•  fast  operation  in  the  “run”  mode  after  the  training  procedure  has  been  completed. 
That  permits  consideration  of  real-time  implementation; 

•  high  noise/error  tolerance  [16]  which  accounts  for  handling  of  partially  missing  or 
incorrect  data. 

Data  flow  and  the  responsibilities  of  the  particular  processing  units  as  well  as  the 
operation  of  the  entire  system  are  discussed  in  this  chapter. 

5.1  Flowchart 

Figure  5-1  shows  the  detailed  data  flow  diagram  and  interactions  between  the 
separate  components.  Raw  GPR  data  are  used  by  two  pre-processing  parts.  Pre¬ 
processing  0  (see  Section  4.1)  checks  the  data  consistency  and  excludes  inadequate  scans 
from  the  regular  processing.  Pre-processing  1  incorporates  some  information  about  the 
previously  processed  scans  and  is  described  in  Section  5.2. 
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Figure  5-1.  Complete  algorithm  flowchart. 

Pre-processing  2,  a  critical  component,  decomposes  the  A-scans  into  the  small 
feature  vector  which  containe  essential  and  highly  discriminative  information  about  the 
data  (Section  4.5).  The  first  main  processing  unit  Neural  Network  1  (NN1)  is  used  to 
recognize  the  subsurface  pattern  of  the  current  scan  (Section  5.3).  Pre-processing  3 
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(section  5.4.1)  extracts  some  other  useful  feature  from  the  adaptively  decomposed  GPR 
scan  and  transforms  the  result  to  fit  the  Neural  Network  2  (NN2),  which  by  combining 
this  information  and  the  knowledge  of  the  current  subsurface  pattern  determines  the 
depths  of  the  stratigraphic  layers  (Section  5.4).  A  separate  NN2  module  is  required  for 
each  subsurface  pattern. 

Several  interpolation  techniques  can  be  used  to  determine  the  layer  depths  for  the 
inconsistent  data.  If  the  number  of  “bad”  scans  is  significant  -  an  investigation  of  the  soil, 
hardware  or  control  software  adjustment  is  warranted. 

Feedback  possibilities  in  this  processing  algorithm  are  shown  with  dashed  lines. 
The  one  which  goes  from  the  Layer  Depths  to  the  Pre-processing  1  provides  the  history 
of  the  depths  values  to  be  able  to  predict  ones  for  the  current  datum.  The  other  two  from 
Subsurface  Pattern  and  the  Layer  Depths  to  the  Pre-processing  1  allow  to  rerun  the  entire 
algorithm  for  the  particular  scan  if  it  appears  to  be  recognized  incorrectly  or  its 
parameters  determined  inaccurately.  Then  the  necessary  modifications  are  made  in  the 
process  of  adaptive  decomposition  and  the  new  feature  vector  is  generated.  Possible 
implementation  of  those  techniques  are  discussed  in  the  Sections  5.3.7  and  5.4.5  of  this 
Chapter. 

5.2  Tracing  interfaces  depths  technique 

This  technique  (pre-processing  1  on  the  block  diagram)  is  based  on  the  physical 
properties  of  the  subsurface  media  in  the  region  of  interest  -  the  actual  geographic  area. 
Those  properties  imply  relatively  flat  water  table  boundary  (that  is  quite  natural  for  any 
water  surface)  and  relatively  flat  top  of  the  permafrost  -  based  on  the  actual  observations. 
On  the  one  hand,  the  approach  provides  a  statistical  curve  fitting  technique  to  predict  the 
location  of  the  particular  interface  for  the  current  A-scan,  based  on  the  information  about 
several  previous  scans.  On  the  other  hand,  the  tracing  procedure  is  used  for  creating  of 
the  top-down  “depth-based”  feedback  (Section  5.4.5).  If  the  particular  interface  depth, 
identified  with  NN2  unit,  for  the  current  scan  differs  significantly  from  the  corresponding 
depth  for  the  same  interface  determined  by  the  statistical  technique  -  an  alarm  signal  is 
sent  to  the  Adaptive  Transform  stage  to  pay  more  attention  to  the  particular  time  interval 
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that  has  caused  the  mismatch  and  the  necessary  adjustments  for  some  of  the  feature 
vector  components  are  to  be  made.  The  implications  of  this  approach  are  similar  to  the 
topology  based  or  context  based  techniques  in  Optical  Character  Recognition  (OCR) 
applied  to  handwriting  or  poorly  typed  printed  texts:  severely  distorted  single  character 
can  hardly  be  recognized  by  the  OCR  system,  but  some  of  its  resolved  features  coupled 
with  the  information  about  neighboring  characters  (like  continuos  topology  for 
handwriting  or  reasonable  context  that  produces  a  word  with  a  common  meaning)  may 
end  up  with  correct  final  recognition. 

In  this  system  the  technique  of  tracing  the  interfaces  depths  was  implemented  by 
means  of  linear  regression  models  [15].  Relatively  flat  interfaces  (like  water  table) 
require  only  linear  term  (model  1  -  equations  5.1  -  5.3),  and  the  interfaces  that  exhibit 
actually  large  variations  (bottom  permafrost  boundary)  may  be  predicted  with  the  model 
with  the  x2  term  present  (model  2  -  equations  5.4  -  5.7). 

d(x)  =  y  =  a  +  b-x  (5.1) 

;=1  X  ;=1  f=l  / 

2  (5.2) 

In 

a  =  y  —  b-x  (5.3) 

where  the  bar  over  the  terms  means  simple  averaging,  and  n  is  the  number  of  points  used 
for  fitting  the  interface  depth  d(x),  and  x  is  the  normalized  distance  along  the  radar  line. 

d(x)  =  y  =  a  +  bl-x  +  b2'X2  (5.4) 

Zjy'iS*/4  ~'ZxS,t,xf 

U  _  /=!  7=1 _ /=! _ /=! 

(5.5) 


46 


5. 


HIERARCHICAL  GPR  DA  TA  PROCESSING  SYSTEM 


n  n 


'Lx?yl'Lxl-'L*,y ,I>.3 


b2  = 


7—1 


7=1 _ 7=1  7=1 


n  n 


7=1  7=1 


7=1 


a  =  y-brx-b2-x: 


(5.6) 


(5.7) 


The  number  of  points 
(nodes)  to  be  used  for  fitting 
and  the  spacing  between  them 
is  user  defined.  Four  nodes  are 
generally  used  and  the  total 
distance  between  them  does  not 
exceed  4  meters.  Figure  5-2 
illustrates  the  prediction  of  the 
interface  depths  with  model  1 
and  model  2,  respectively. 


Figure  5-2.  Regression  models  performance. 


5.3  Subsurface  pattern  identification 

This  section  covers  the  process  of  subsurface  pattern  identification  starting  with 
the  rules  for  classification  of  the  GPR  data  into  categories,  encoding,  followed  by  the 
description  of  the  Neural  Network  1  architecture,  parameter  choice,  process  of  generating 
a  training  set  from  the  Adaptive  Transform  data,  and  training  itself. 

5.3. 1  Classification  and  encoding  of  the  subsurface  patterns 

For  the  experiments  with  synthetic  data  seven  subsurface  patterns  were  defined 
and  subsequently  used  for  the  training  sets:  the  example  of  the  pattern  is  presented  in 
Table  5.1,  a  complete  set  of  subsurface  patterns  is  provided  in  Appendix  B.  Information 
about  the  season  is  highly  important  when  assigning  the  appropriate  e  values  to  the 
different  layers.  Layers,  that  may  be  frozen  in  spring,  may  be  thawed  in  autumn. 
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changing  significantly  the  reflection  pattern  at  the  beginning  of  the  scan  as  well  as  the 
reflection  times  for  the  deeper  layers. 


Table  5.1.  Example  of  the  subsurface  pattern. 


Pattern  Number 

Season 

Layers 

Dielectric 

constant 

1 

Autumn 

Active  layer  (organics) 

14-25 

Silt  or  sand  (dry) 

<20 

Sand  or  gravels  (saturated) 

20-45 

Permafrost 

4.4 -5.6 

There  is  only  one  rule  for  classification  of  the  possible  combinations  of  the 
stratigraphic  layers  into  subsurface  patterns  -  each  new  combination  of  the  existing  layers 
or  appearance  of  any  additional  layer  generally  leads  to  a  new  pattern. 


Patterns  are  encoded  using  a  1-OF-C  (binary  encoding  for  each  category)  [20] 
encoding  method.  Number  of  outputs  of  the  Neural  Network  1  is  equal  to  the  number  of 
patterns  used  for  this  particular  simulation: 


Table  5.2. 1-OF-C  patterns  encoding. 


Output  1 

Output  2 

Output  3 

Pattern  1 

1.0 

0.0 

0.0 

Pattern  2 

0.0 

1.0 

0.0 

Pattern  3 

0.0 

0.0 

1.0 

5.3.2  Generation  of  the  training  set 

The  training  set  was  automatically  generated  from  the  Adaptive  Transform 
coefficients:  shift  and  weight  values.  Information  about  the  patterns  structure  (number  of 
patterns,  number  of  inputs  and  outputs)  was  stored  in  the  training  set  as  well  as  the 
patterns  itself  (please  note,  that  here  the  term  “pattern”  does  not  mean  the  “subsurface 
pattern”,  it  only  means  a  single  training  example  -  one  feature  vector,  that  corresponds  to 
a  single  GPR  A-scan).  A  simple  technique  was  used  to  construct  the  input  (and 
nevertheless  proved  to  be  accurate  enough  for  correct  processing):  10  AT  weight  values 
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in  chronological  order  followed  by  the  same  10  weights  in  the  descending  order  with 
respect  to  their  magnitudes.  Implications  for  this  particular  choice  for  the  feature  vector 
construction  were  discussed  in  Section  4.6.  This  feature  vector  choice  might  not  be  the 
optimal  one,  but  it  was  the  best  at  the  time  this  work  was  done. 

5.3.3  Neural  Network  1  architecture  and  parameters 


The  block  diagram  of  the  Neural  Network  1  operation  is  shown  in  the  Figure  5-3. 


Figure  5-3.  Neural  Network  1  architecture. 

The  architecture  used  was  an  ordinary  feedforward  layered  neural  network  -  a 
Multilayer  Perceptron  with  a  single  hidden  layer.  Number  of  neurons  in  the  input  and  the 
output  layers  correspond  to  the  number  of  inputs  and  outputs  for  a  single  pattern  in  the 
training  set.  One  part,  which  is  a  user  responsibility,  is  to  set  the  appropriate  number  of 
neurons  in  the  hidden  layer  -  there  are  no  absolute  rules  [20],  but  certain  guidelines 
address  the  problem  of  overfitting  [20],  [21].  Overfitting  occurs  when  the  network  is 
“overtrained”  -  it  can  not  exhibit  a  good  generalization  property  due  to  the  fact  that  it  has 
learned  all  the  training  patterns  in  great  details  and  is  not  able  to  perform  well  on  the  data 
it  has  not  seen  before  (not  from  the  training  set).  One  of  the  methods  that  helps  avoid 
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overfitting  is  to  choose  the  total  number  of  weights  (by  setting  the  appropriate  number  of 
hidden  neurons)  in  the  network  to  be  several  times  less  than  the  total  number  of  training 
patterns. 

Two  different  algorithms  were  used  for  training  of  the  network:  on-line 
Backpropagation  and  Scaled  Conjugate  Gradient  (SCG)  methods  covered  in  the  Chapter 
3.  Their  performance  in  terms  of  computational  time  and  resources  was  not  significantly 
different  due  to  the  small  size  of  the  problem.  Since  SCG  is  more  complicated  and  at 
present  implementation  is  less  suitable  for  results  evaluation  and  visualization  and  mostly 
intended  to  solve  large  scale  problems,  the  discussion  on  performance,  noise  and  error 
issues  is  limited  to  the  more  conventional  Backpropagation  method. 

Another  essential  feature  of  the  neural  network  implementation  is  data 
normalization  [5],  Several  normalization  methods  are  applicable  to  the  MLP  networks: 
column  normalization  (each  input  variable  normalized  separately),  full  normalization  (all 
variables  normalized  by  the  same  factor,  i.e.  the  largest  input  value  among  all  the 
variables).  In  this  implementation  full  normalization  was  employed  due  to  the  fact  that 
absolute  values  of  the  inputs  (AT  weights)  were  of  the  same  order.  Column  normalization 
may  be  helpful  for  data  with  a  significant  variables  difference  in  magnitude. 

5.3.4  Neural  Network  1  training  and  performance 

The  training  was  performed  on  a  training  set  generated  from  the  synthetic  GPR 
line  based  on  the  first  (west)  600  meters  of  the  real  radar  line  CR93-11.  Four  different 
subsurface  patterns  were  defined  (patterns  1,  2,  3,  and  4  in  Appendix  A).  Training  set 
with  450  patterns  corresponding  to  the  same  number  of  the  artificial  A-scans  was  used. 

The  network  total  operation  time  (including  training)  for  a  DEC  ALPHA  (175 
MHz)  machine  was  10  seconds  for  the  network  architecture  with  10  hidden  neurons,  and 
7  seconds  for  3  hidden  neurons.  The  convergence  plot  is  presented  in  Figure  5-4.  Both 
networks  each  the  desired  error  level,  but  the  one  with  10  hidden  neurons  did  so  in  fewer 
number  of  iterations. 
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Figure  5-4.  Convergence  of  NN1, 10  hidden  neurons  (left),  3  hidden  neurons  (right). 
5.3.5  Pattern  identification  results 


A  synthetic  test  set  was  used  to  validate  the  network  accuracy  and  generalization. 
Thirty  artificial  scan  examples  representing  all  four  subsurface  configurations,  that  were 
not  presented  to  the  network  as  part  of  the  training  set  were  tested  -  every  single  input 
vector  was  recognized  correctly.  Network  activation  values  (black)  vs.  actual  desired 
outputs  (gray)  are  shown  in  Figure  5-5.  Network  output  was  slightly  smaller  than  the 
actual  value  it  should  be,  only  because  of  the  fact  that  normalized  output  values  (0.5)  and 
(-0.5)  were  the  maximum  and  minimum  values  the  network  had  learned  and,  due  to  its 
properties,  it  couldn’t  extrapolate  beyond  the  range  of  the  values,  that  were  presented 
during  the  training. 
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Figure  5-5.  Neural  Network  1  pattern  recognition  accuracy. 

5.3.6  Testing  with  real  data 

Three  different  A-scans  from  the  GPR  line  CR93-1 1  were  used  to  verify  the  entire 
processing  algorithm  performance  for  the  ultimate  problem,  it  was  designed  for  - 
recognition  of  real  data  by  the  neural  network  trained  with  synthetic  data.  Unfortunately, 
even  with  the  borehole  information  available  it  was  not  always  possible  to  determine  the 
full  information  about  all  the  details  of  the  subsurface  media  profile  corresponding  to  the 
particular  A-scan.  That  is  why  the  evaluation  of  the  NN1  accuracy  for  the  real  data 
recognition  was  made  based  not  on  a  100%  correct  information,  thus,  it  has  a  slightly 
fuzzy  meaning.  Table  5.3  gives  the  example  of  the  tests  result  for  3  A-scans:  scans  1614 
and  2700  very  likely  represent  the  subsurface  configurations  defined  as  Pattern  1  and 
Pattern  2  respectively,  scan  8810  is  extracted  from  the  East  part  of  the  CR93-11  and, 
probably,  does  not  correspond  to  any  of  subsurface  patterns  used  for  this  simulations. 

The  left  part  of  the  Table  5.3  shows  the  activation  values  of  the  output  neurons  as 
a  respond  to  the  feature  vector  generated  with  the  Adaptive  Transform  from  the  actual 
GPR  A-scan.  The  right  part  contains  the  ideal  (normalized)  responses  for  the  four 
subsurface  patterns  used  for  NN1  training.  The  comparison  results  in  correct  recognition 
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for  the  scans  1614  and  2700,  which  very  likely  correspond  to  the  actual  configuration  of 
the  stratigraphic  layers  similar  to  the  patterns  1  and  2,  respectively.  Scan  8810  was  not 
classified  into  any  category,  and  this  does  not  contradict  with  the  assumptions  made. 


Table  5.3.  NN1  recognition  results  for  real  data. 


Figure  5-6.  A-scan  1614  from  GPR  line  CR93-11. 


Figure  5-7.  A-scan  2700  from  GPR  line  CR93-11. 
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5.3.7  Pattern-based  feedback 

Actual  GPR  line  may  be  composed  of  the  scans  that  belong  to  different 
subsurface  patterns,  but,  in  general,  the  properties  of  the  vertical  soil  and  soil  structure 
(like  permafrost)  distribution  do  not  undergo  significant  changes  over  the  distance  of 
meters  or  tens  of  meters.  Thus,  with  the  collection  rate  of  5-10  scans  per  meter,  the 
adjacent  scans  very  likely  represent  the  same  subsurface  pattern,  and  occurrence  of  the 
other  pattern  is  most  probably  due  to  the  incorrect  pattern  recognition. 

When  the  different  subsurface  pattern  occurs  the  proposed  feedback  generates  an 
alarm  signal  and  sends  it  to  the  Adaptive  Transform  pre-processing  stage.  AT  addresses 
the  correlation  part  (Section  4.5.1)  and  checks  for  the  high  correlation  values,  that  did  not 
take  part  in  the  scan  decomposition  because  they  were  suppressed  by  the  other  slightly 
higher  correlations  nearby.  This  could  be  caused  by  noise  or  other  factors.  On  the  next 
step  a  different  correlation  is  chosen  suppressing  the  formerly  dominating  one  and  the 
decomposition  is  performed  again,  followed  by  the  new  attempt  of  subsurface  pattern 
recognition  by  NN1. 

This  method  has  not  been  implemented  in  software  partly  because  of  the  lack  of 
availability  of  some  essential  GPR  hardware  related  information  preventing  the 
development  and  implementation  of  the  fully  automatic  data  processing  systems.  Please 
refer  to  Chapter  6  for  further  work  discussion. 


54 


5. 


HIERARCHICAL  GPR  DATA  PROCESSING  SYSTEM 


5.4  Determining  depths  of  the  subsurface  layers 

This  section  discusses  the  second  part  of  the  main  processing  architecture, 
developed  for  the  identification  of  the  depths  of  the  particular  layers  for  the  known 
subsurface  pattern.  The  general  idea  of  this  approach  is  expressed  in  the  Figure  5-9. 


'ADAPTIVE  TRANSFORM  SHIFTS  > 

IN  TIME  ORDER  FOR  WHICH 

CORRESPONDING  WEIGHTS  EXCEED 
THE  MAGNITUDE  THRESHOLD  OF  3% 

OF  THE  INITIAL  PULSE  MAGNITUDE 
.  (AFTER  APPROPRIATE  NORMALIZATION) u 


SUBSURFACE  PATTERN 
INCORPORATES  THE 
HNFORMATION  ABOUT: 


SUBSURFACE  LAYERS 


CONFIGURATION : 


Figure  5-9.  Layers  depths  identification  setup. 

5.4. 1  Generation  of  the  training  sets 

The  training  set  for  Neural  Network  2  was  automatically  generated  from  the 
Adaptive  Transform  coefficients.  Another  technique  was  used  for  the  assembling  of  a 
single  training  pattern  due  to  the  different  from  NN1  purpose  of  this  processing  stage  - 
the  approximation  of  certain  continuous  parameters. 

First,  ten  AT  shift  values  which  corresponded  to  the  weights  with  the  magnitude 
exceeding  some  threshold  were  chosen  and  arranged  in  chronological  order.  Then,  for 
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each  of  the  10  variables  the  average  value  was  calculated.  At  the  next  step  each  variable 
was  replaced  by  the  difference  of  its  original  value  and  the  corresponding  average.  The 
same  operation  was  performed  for  the  depth  values  used  as  an  output  for  the  Neural 
Network  2. 

This  procedure  allowed  the  incorporation  of  the  essential  information  about  data 
peculiarities  into  the  input/output  variables,  as  a  consequence  of  the  increased  range  of 
change  in  values:  substitution  of  deviations  for  the  actual  values.  Finally,  this  technique 
for  generating  the  feature  vectors  for  the  training  set  improved  significantly  the  accuracy 
of  the  depth  identification. 


5.4.2  Neural  Network  2  architecture  and  parameters 


Architecture  of  the  Neural  Network  2  is  basically  the  same  as  of  the  Neural 
Network  1  -  a  Multilayer  Perceptron  with  one  hidden  layer.  The  difference  is  in  the 
number  of  neurons  in  the  layers  -  as  far  as  the  information  about  the  subsurface  pattern  is 
already  known,  the  number  of  possible  parameter  distributions  (depth  values)  is  smaller, 
than  for  the  data  set  containing  different  subsurface  patterns,  that  is  why  fewer  inputs 
may  be  used  for  correct  network  operation. 


SHIFTS  DEVIATIONS 
IN  TIME  ORDER: 

{(?,  -/,W), -.,(?! 0  -Of 


"3  outputs"] 


2 -6  HIDDEN 
NEURONS 


DEPTH  OF  THE  TOP  OF 
THE  PERMAFROST 


DEPTH  OF  THE  WATER  TABLE 


DEPTH  OF  THE  BEDROCK 


Figure  5-10.  Neural  Network  2  architecture. 

The  problem  formulation  for  this  processing  part  is  “approximation”.  The 
continuous  parameters  are  used  as  the  target  outputs  for  the  error  correction  during  the 
network  training  instead  of  binary  categories  for  NN1.  But  this  difference  does  not  affect 
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the  training  procedure  with  the  exception  of  the  stopping  criterion.  The  possible 
overfitting  was  more  critical  for  NN2  than  for  NN1  and,  besides  the  appropriate 
architecture  choice,  the  “early  stopping”  method  [20]  is  employed.  This  was 
accomplished  simply  by  making  the  stopping  training  tolerance  higher  than,  for  example, 
the  set  value  for  NN1 . 

5.4.3  Neural  Network  2  performance 

The  network  was  trained  with  the  data  set  generated  from  the  same  CR93-11 
synthetic  GPR  line,  but  only  the  patterns  that  belong  to  the  subsurface  pattern  2  were 
chosen  to  be  included  into  the  training  set.  A  total  of  100  patterns  were  used,  92  of  them 
for  training  purposes,  8  for  verification  of  the  network  performance. 

Networks  with  different  number  of  hidden  neurons  were  tested  and  the 
performance  in  terms  of  final  accuracy  was  measured  (Figure  5-11).  The  results  verify 
the  statements  about  the  appropriate  choice  of  the  number  of  hidden  neurons  -  too  many 
neurons  in  the  hidden  layer  lead  to  overfitting  and  worsen  the  performance  on  the  testing 
set,  the  network  generalization  and  consequently  interpolation  property  becomes  poor  - 
the  results  are  less  accurate.  CPU  time  (DEC  ALPHA  175  MHz,  seconds)  does  not  reveal 
a  lot  of  information  because  larger  networks  have  more  weights  participating  in 
computation  but  it  may  converge  faster  due  to  the  ability  to  store  more  information. 
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Figure  5-11.  NN2  performance  for  different  number  of  hidden  neurons. 


The  network  with  3 
hidden  neurons  was  chosen  as 
a  final  one.  The  plot  of 
convergence  is  shown  in  the 
Figure  5-12. 


Figure  5-12.  NN2  convergence. 
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5.4.4  Depths  determining  results 

Accurate  results  for  the  depths  of  3  layers  for  synthetic  GPR  line  CR93-11 
subsurface  pattern  2  were  generated  as  an  output  of  NN2.  The  average  error  was  about 
4%  for  the  transformed  and  normalized  data.  Due  to  the  discussed  above  technique  for 
the  training  set  generation  and  the  built-in  network  data  normalization,  the  error  was  the 
same  regardless  of  the  absolute  depth  of  the  layer.  The  maximum  absolute  error  for  this 
simulation  of  about  0.5  meters  was  produced  as  a  result  to  approach  the  boundary  of  the 
range  of  the  possible  network  output  values. 


Figure  5-13.  Layer  depths  identification  results  with  NN2. 

The  results  are  presented  in  Figure  5-13,  plotted  values  do  not  reflect  the  exact 
distribution  of  the  layers  because  in  real  life  depths  are  measured  from  the  ground  surface 
-  in  this  simulation  model  they  were  measured  from  the  imaginary  zero  level.  But  the 
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resulting  values  may  be  transformed  into  stratigraphic  information  incorporating  the  data 
about  the  surface  elevation. 

5.4.5  Testing  with  real  data 

Data  from  the  same  GPR  line  (CR93-11),  as  in  the  Section  5.3.6  were  used  to 
verify  the  applicability  of  NN2  module  trained  with  synthetic  data  to  real  test  cases.  NN2 
was  trained  to  determine  the  depths  of  the  subsurface  layers  for  the  scans  that  belong  to 
subsurface  pattern  2,  that  is  why  only  the  scan  2700  was  used.  Results  are  summarized  in 
Table  5.4  below. 


Table  5.4.  Testing  results  for  NN2  for  real  data. 


Scan  number 

2700 

Depth, 

predicted  by  NN2,  m 

Depth  from 
Borehole  data,  m 

Layer  type 

Activation  of  neuron  1 

0.231 

2.25 

2.40 

Water  table 

Activation  of  neuron  2 

0.468 

4.08 

4.50 

Permafrost 

Activation  of  neuron  3 

0.699 

19.88 

25.50 

Bedrock 

The  accuracy  may  be  considered  good  taking  into  account  the  fact,  that  borehole 
data  are  not  100%  reliable,  and  the  scan  location  on  the  GPR  line  might  not  be  very  close 
to  the  hole  drilled.  Nevertheless,  assuming  the  correctness  of  the  experimental  borehole 
information  for  this  case,  the  following  conclusions  can  be  made.  Unlike  the  situation 
when  the  synthetic  data  used  as  the  test  examples,  for  the  real  test  examples  the  error 
increases  with  depth.  There  are  at  least  two  reasons:  the  first  one  accounts  for  the  longer 
traveling  distance  of  the  electromagnetic  wave  reflected  from  the  distant  interface 
resulting  into  more  scatterers  on  its  way,  hence  more  noise  is  accumulated  in  the  second 
half  of  the  A-scan.  The  second  reason  is  concerned  with  the  possible  change  of  the  e 
value  -  the  longer  is  the  distance,  the  more  effect  it  may  produce  on  the  reflection  time 
versus  layer  depth  relation. 


No  extensive  testing  was  performed  mostly  due  to  the  lack  of  the  reliable  and 
accurate  in  all  three  dimensions  experimental  geophysical  information,  which  may  be 
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54  6  Depth-based  feedback 

This  feature  of  the  entire  processing  architecture  can  be  used  for  mo  «fa» 
purposes.  One  is  jus,  *  provide  the  Pre-promsstag  1  with  (bn  current  ***** " 
order  to  trace  the  irderfece  depfes  for  the  next  sc*,  prooessiug.  Another  god  ts  ,i  n 
iTpattcm-hased  feedback  discus,,,  in  Section  5.3.7:  if  be  depfe of  • 
interface  is  determined  and  is  out  of  some  reasonable  range,  an  alarm  sign 
AT  and  the  same  procedure  for  picking  the  differ  cormiation  pom,  ts  ^  The 
ooly  difference  is  that  the  range  of  seruch  for  the  different  high  correlation  vah*  * 
narrowed  down  to  fee  neighborhood  of  fee  reflection  corresponding  to  fee  mterfac.  of  fe 

current  interest 

5.5  Accuracy,  noise  tolerance 

For  fee  hierarchical  architecture  it  is  no.  always  possible  to  evaluate  fee  algmrthm 
accuracy  as  a  single  sysfem  Some  infbnuation  passed  from  one  sfeec  m  another  ,s  binary 

Adaptive  Transform,  Neural  Networks  1  and  2  as  fee  separate  processing  »> 
discussed  in  Sections  4.5.  5.3.6,  and  5.4.4,  respectively.  Thru  section  provr 
estimation  of  fee  system  (Adaptive  Transform  *  NN1)  accuracy  in  terms  of  subsume 

pattern  recognition  sensitivity  to  fee  change  of  the  a  value  for  the 

Dielectric  permittivity  value  for  one  or  mo  media  were  varied  and  fee  table  below 
summarizes  the  results.  These  results  were  generated  by  averaging  over  40  test  examp  e 


5.  HIERARCHICAL  GPR  DA  TA  PROCESSING  SYSTEM 

Table  5.5.  Subsurface  pattern  recognition  sensitivity  to  the  change 
of  the  dielectric  permittivity  value. 

Number  of  mediums  varied  Difference  from  the  original  e  value  when  recognition 

becomes  incorrect,  % 
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6.  CONCLUSIONS 
6.1  Summary 

As  a  result  of  this  research  the  following  separate  Ground-Penetrating  Radar 
(GPR)  data  processing  system  components  have  been  developed  and  implemented  in 
software. 

•  Adaptive  Transform  (AT)  signal  decomposition  technique,  which  allows  the  handling 
of  uncertain  and  noisy  raw  GPR  data  by  extracting  representative  and  discriminative 
features; 

•  Three  additional  pre-processing  methods,  which  provide  the  possibility  of  eliminating 
inconsistent  data,  utilizing  the  information  about  previously  processed  data,  and 
filtering  the  AT  feature  vector  in  an  effective  way; 

•  Two  neural  networks  modules  that  perform  major  processing  functions  based  on  either 
Backpropagation  or  Scaled  Conjugate  Gradient  training  algorithm. 

Performance  of  the  each  component  is  tested  with  synthetic  as  well  as  real  GPR 
data  and  has  demonstrated  acceptable  operation  with  respect  to  accuracy  and  error/noise 
tolerance. 

The  hierarchical  GPR  data  processing  system  is  constructed  from  the  above 
mentioned  components.  Consecutive  operation  of  the  developed  algorithm  stages  allows 
to  decrease  the  degree  of  data  uncertainty  for  each  of  the  following  processing  step. 

The  entire  system,  constructed  and  trained  with  synthetic  data,  is  tested  with  real 
GPR  information.  Successful  results  and  robustness  of  the  algorithm  have  established  the 
feasibility  of  applying  the  approach  to  actual  GPR  related  scientific  and  technological 
problems.  This  research  can  now  become  the  basis  for  further  development  and 
improvement. 
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6.2  Further  work 

Although  the  feasibility  of  applying  the  proposed  system  is  established,  certain 
improvements  and  additions  should  be  made.  The  first  group  of  improvements  relates  to 
both  synthetic  and  real  GPR  data.  The  current  model  for  synthetic  data  generation  may  be 
extended,  by  including  more  physical  factors  when  simulating  soil  and  soil  structure 
electrical  properties.  Factors  such  as  wave  attenuation,  dispersion,  and  scattering  are  the 
primary  goals  in  this  study.  Taking  into  account  these  aspects,  more  understanding  of  the 
underlying  physical  processes  could  be  achieved.  This  may  yield  a  better  understanding  of 
the  behavior  of  electromagnetic  waves  in  subsurface  media  revealing  the  cause  of 
uncertainties  in  GPR  data  related  phenomena.  Certain  improvements  in  the  GPR  data 
collection  technology  in  terms  of  antenna  gain  control  and  availability  of  additional  useful 
information  will  lead  to  overcome  such  difficulties  as  inability  to  correctly  normalize  the 
data  and  define  the  proper  basis  functions  for  the  AT. 

The  second  group  of  improvements  is  algorithm  related.  The  following  ideas  were 
considered,  but  were  not  implemented  due  to  the  lack  of  time. 

•  extension  of  the  AT  into  a  multiresolution  algorithm  permitting  the  handling  of  wave 
dispersion; 

•  development  of  a  more  efficient  way  for  constructing  the  feature  vector  from  AT 
coefficients; 

•  testing  Radial  Basis  Functions  neural  network  paradigm  as  an  approximation 
processing  tool; 

•  implementation  of  a  pattern-based  and  depth-based  feedback  along  with  the 
optimization  technique  to  compensate  for  possible  errors  on  any  processing  stage. 

The  concept  of  a  fully  automatic  GPR  data  processing  system  is  a  promising 
engineering  area  considering  the  growing  demand  in  environmental,  industrial  and  other 
applications.  Development  of  a  working  system,  which  can  operate  in  conjunction  with 
existing  GPR  equipment,  will  certainly  be  of  benefit  to  our  industrial  society. 
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Appendix  A  SIMULATION  RESULTS  FOR 
SYNTHETIC  GPR  LINES 

A.1  Training  set 

A  synthetic  training  set  (main  training  set)  consisted  of  approximately  3300 
artificial  scans  representing  seven  possible  subsurface  patterns  was  generated.  Depths  of 
the  layers  were  varied  within  the  limits  exceeding  possible  depth  variations  for  the 
synthetic  lines  based  on  the  real  geometry  analyzed  with  GPR  technique.  The  subsets  of 
the  training  set  corresponded  to  patterns  1,  2,  3,  5,  6,  and  7  were  used  for  training  of  the 
NN2  models  for  each  of  those  patterns,  number  of  examples  in  each  subset  varied  from 
300  to  600. 

A.2  Synthetic  GPR  line  based  on  CR93-11 

The  approximate  state  coordinates  of  the  starting  and  the  end  points  of  the  line  are 
((243 137.0,  3968939.0),(247863.0,  397091 1.0)}.  Line  direction  is  from  west  to  east.  The 
first  600  meters  of  the  line  were  approximated  with  the  synthetic  geometry.  Then  artificial 
A-scans  were  decomposed  into  the  set  of  Adaptive  Transform  shift  and  weight 
coefficients.  The  geometry,  the  dielectric  permittivity  values,  and  the  subsurface  patterns 
distribution  (red  capital  “P”  with  numbers)  are  shown  in  Figure  A-l. 

Neural  Network  1  trained  with  the  main  training  set  was  used  to  identify 
subsurface  patterns  and  corresponding  layer  depths  for  CR93-11  based  synthetic  line. 
Pattern  identification  results  are  sown  in  Figure  A-2,  depth  predictions  for  patterns  1,  2 
and  3  (using  corresponding  NN2  modules)  -  in  Figures  A-3,  A4,  and  A-5. 
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Figure  Appendix  A  -1.  Geometry  and  dielectric  constants  for  synthetic  CR93-11 

GPR  line. 


Figure  Appendix  A  -2,  Pattern  identification  results  for  CR93-11  GPR  line. 
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Figure  Appendix  A  -5.  Depth  predictions  for  Pattern  3. 


A.3  Synthetic  GPR  line  based  on  CR94-61r 

The  approximate  state  coordinates  of  the  starting  and  the  end  points  of  the  line  are 
{(246291.0,  3968477.0),(246266.0,  3970243.0)}.  Line  direction  is  from  south  to  north. 
The  geometry  and  the  dielectric  permittivity  values  are  shown  in  Figure  A-6,  first  900 
meters  of  the  line  were  used. 

Neural  Network  1  trained  with  the  main  training  set  was  used  to  identify 
subsurface  patterns  and  corresponding  layer  depths  for  CR94-61r  based  synthetic  line. 
Pattern  identification  results  are  sown  in  Figure  A-7,  depth  predictions  for  patterns  5,  6 
and  7  (using  corresponding  NN2  modules)  -  in  Figures  A-8,  A9,  and  A- 10. 
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Figure  Appendix  A  -7.  Pattern  identification  results  for  CR94-61r  GPR  line. 
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Test  scon  number 


Figure  Appendix  A  -10.  Depth  predictions  for  Pattern  7. 
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Appendix  B.  SUBSURFACE  PATTERNS  CLASSIFICATION 

B.1  Pattern  1 


Table  Appendix  B.  .1.  Layers  configuration  for  subsurface  pattern  1. 


Pattern  Number 

Season 

Layers 

Dielectric  constant 

1 

Autumn 

Active  layer  (organics) 

14.0-25.0 

Silt  or  sand  (dry) 

<20.0 

Sand  or  gravels  (saturated) 

20.0-45.0 

Permafrost 

4.4 -5.6 

Figure  Appendix  B.  -1.  Typical  A-scan  from  GPR  line  CR93-11  of  subsurface 

pattern  1. 
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B.2  Pattern  2 


Table  Appendix  B.  .2.  Layers  configuration  for  subsurface  pattern  2. 


Pattern  Number 

Season 

Layers 

Dielectric  constant 

2 

Autumn 

Active  layer  (organics) 

14-25 

Silt  or  sand  (dry) 

<20.0 

Sand  or  gravels  (saturated) 

20.0-45.0 

Permafrost 

4.4 -5.6 

Weathered  Bedrock 

>  11.0 

Figure  Appendix  B.  -2.  Typical  A-scan  from  GPR  line  CR93-11  of  subsurface 

pattern  2. 
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B.3  Pattern  3 


Table  Appendix  B.  .3.  Layers  configuration  for  subsurface  pattern  3. 


Pattern  Number 

Season 

Layers 

Dielectric  constant 

3 

Autumn 

Active  layer  (organics) 

14.0-25.0 

Silt  or  sand  (dry) 

<20.0 

Sand  or  gravels  (saturated) 

20.0-45.0 

Permafrost 

4.4 -5.6 

Cryopeg  or  Intrapermafrost 
diffractions  (fine-grained  sediments) 

>  10.0 

Figure  Appendix  B.  -3.  Typical  A-scan  from  GPR  line  CR93-11  of  subsurface 

pattern  3. 
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BA  Pattern  4 


Table  Appendix  B.  .4.  Layers  configuration  for  subsurface  pattern  4. 


Pattern  Number 

Season 

Layers 

Dielectric  constant 

4 

Autumn 

Active  layer  (organics) 

14.0-25.0 

Silt  or  sand  (dry) 

<20.0 

Sand  or  gravels  (saturated) 

20.0  -  45.0 

Unknown  layers 

Unknown 

properties 

No  typical  A-scan  is  available,  the  pattern  was  used  to  avoid  the  possible  network 
confusion.  Simulated  scans  in  this  area  do  not  match  the  experimental  GPR  data,  the  main 
reason  for  introducing  this  pattern  is  to  somehow  label  the  data,  that  do  not  belong  to  any 
category  currently  used. 
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B.5  Pattern  5 


Table  Appendix  B.  .5.  Layers  configuration  for  subsurface  pattern  5. 


Pattern  Number 

Season 

Layers 

Dielectric  constant 

5 

Spring 

Active  layer  (organics) 

14-25 

Permafrost 

4.4 -5.6 

Unfrozen  sand  or  gravels  (saturated) 

20-45 

Permafrost 

4.4 -5.6 

Figure  Appendix  B.  -4.  Typical  A-scan  from  GPR  line  CR94-61r  of  subsurface 

pattern  5. 
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B.6  Pattern  6 


Table  Appendix  B.  .6.  Layers  configuration  for  subsurface  pattern  6. 


Pattern  Number 

Season 

Layers 

Dielectric  constant 

6 

Spring 

Active  layer  (organics) 

14-25 

Permafrost 

4.4  -5.6 

Unfrozen  sand  or  gravels  (saturated) 

20-45 

Permafrost 

4.4 -5.6 

Cryopeg  or  Intrapermafrost 
diffractions  (fine-grained  sediments) 

>  10.0 

Figure  Appendix  B.  -5.  Typical  A-scan  from  GPR  line  CR94-61r  of  subsurface 

pattern  6. 
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B.7  Pattern  7 


Table  Appendix  B.  .7.  Layers  configuration  for  subsurface  pattern  7. 


Pattern  Number 

Season 

Layers 

Dielectric  constant 

7 

Spring 

Active  layer  (organics) 

14-25 

Permafrost 

4.4  -  5.6 

Unfrozen  sand  or  gravels  (saturated) 

20-45 

Permafrost 

4.4 -5.6 

Weathered  bedrock 

>  11.0 

Figure  Appendix  B.  -6.  Typical  A-scan  from  GPR  line  CR94-61r  of  subsurface 

pattern  7. 
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Appendix  C  SIMULATION  RESULTS  FOR 
REAL  GPR  LINE 

Neural  Network  1  trained  with  the  main  training  set  (Appendix  A)  and  six  Neural 
Network  2  modules  corresponding  to  patterns  1,  2,  3,  5,  6,  7  were  used  to  estimate  the 
patterns  and  depths  for  the  real  GPR  line  CR93-1 1.  The  results  are  shown  in  Figure  C-l. 
Numbers  above  y  =  0  denote  the  subsurface  pattern  number,  below  y  =  0  -  correspond  to 
the  interface  depth  values. 


Scon  number 


Figure  Appendix  C  -1.  Pattern  and  depths  identification  results  for  the  real  GPR 

radar  line  CR93-11. 

The  data  in  the  above  figure  are  somewhat  confusing  due  to  the  pattern  change 
observed  in  the  upper  part  of  the  figure.  If  one  attempts  to  connect  several  of  the  points  to 
sketch  out  the  layer  depths,  the  following  picture  would  be  produced  -  see  Figure  C-2. 

Note,  that  not  all  the  points  were  connected  due  to  uncertainty  of  the  results  as 
explained  in  the  main  text  of  this  report. 
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Scon  number 


Figure  Appendix  C  -2.  An  attempt  to  connect  the  adjacent  points  to  produce  layer 

interfaces. 
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